Now, let's look at hierarchical linear regression. We want to use the same article on multi-level modeling techniques that we discussed earlier for linear regression problems as well. As mentioned above, this is particularly useful when presented with imbalanced subgroups of sparse data. In this example, we're going to create some data with 8 subgroups. In this data, 7 of the subgroups have 20 data points and the last one has a single data point. The data for all the 8 groups are now generated from a normal distribution of mean 10 and a standard deviation of 1. The parameters for the linear model are generated from the normal and beta distributions. So let's look at the code example here. Here, N corresponds to the total number of data points in every subgroup except for the last one. And M corresponds to the total number of subgroups. We then generate the parameters alpha, beta, and epsilon by drawing from a normal beta and normal distribution, respectively. We could then draw some sample values for x or input data from a normal distribution. And then we generate the y or the output using the drawn values for alpha, beta, epsilon, nx. We then perform a scatter plot of the data and you notice the first 7 have 20 data points in these groups, the last one only has a single data point. Now, let's build a non-hierarchical model first so that we have something to compare against. We also mean-center the data to make it easy for the sample to converge. So we're going to plot what's called a forest plot or for parameters. So what's interesting here is that they obtained alpha and beta parameters vary for each group, but particularly the last one, which seems to have a fairly large variance or spread of values. Now, let's take a look at this forest plot. The syntax for that as you pass the trace object to it, along with the list of the parameters you want plotted and combined simply means that we want them all to be plotted on the same chart. So now if we scroll down, we would see that we have on the y-axis, a list of the parameters. So we have alpha 0 to alpha 8, and beta 0 to beta 8, all listed on the y-axis. The x-axis has a range of values that these primaries can take. So along with the range of values, we also see the 94% incredible interval that's plotted for each of these parameters. So what's interesting here is that alpha 7 has a really large spread, a range of values. A similar situation happens for beta 7 as well. Unfortunately, this is what happens when we use a non-hierarchical model and we have very few data points in some of the subgroups. We can fix this problem using a hierarchical model. So now, let's reformulate our problem to use a hierarchical model. So we do this by setting hyper pliers on the alpha and beta parameters. Or to be more precise, the hyper pliers are now applied to the scale version of alpha for alpha temp. Do you notice here that now, instead of actually having constant values for our mean and standard deviation parameters for alpha and beta, we're setting these values to be drawn from and other distribution. So these are now called the hyper pliers. So these distributions or the hyper pliers themselves take a hyper parameter values which are constants. We then set up the likelihood function using student's t distribution as we see here. You can record the original alpha from the scale or the centered alpha using these equations here. And since alpha is now a distribution, you can also determine the parameters for this original alpha using a similar set of equations as you see here. We then proceed to sample from this distribution. So we can get estimates for alpha and beta. When do we generate three sets of plots? The first one is a forest plot, so we can visualize the estimates for or the distribution of our parameters alpha and beta. We then generate scatter plots for all our 8 groups, along with a regression line for each one of these groups using the estimates for alpha and beta. Or more specifically, we're going to use the mean of the alpha and beta to generate these regression lines. And finally, since hierarchical models can be fairly complex, it might be beneficial to visualize them using a plate notation. So if you look at the forest plot for these parameters, you'll see that now, alpha 7 and beta 7 now have reasonable values as opposed to earlier when alpha 7 and beta 7 had a fairly large distribution. So if you look at the x-axis here, alpha 7 now extended all the way from almost -200 all the way to 200. Even the beta 7 sound quite as large, you can see that the distribution of beta seven is still fairly large, larger compared to the other parameters. However, in the hierarchical model, you noticed that the distribution of alpha 7 and beta 7 are almost equal to the other parameters distributions. So if you look at the scatter plot now of the 8 subgroups here, you would notice here that you had data points along with the regression line that we draw using the mean of the alpha and beta estimates that we were able to infer. So what's interesting here is that for the last subgroup, we only a one data point when we're still able to fit a regression line through it. This is because it's able to use information from the other groups and that allows us to fit a regression line through this single data data point. This would not have been possible in a non-hierarchical model.