So far in our course, we've almost exclusively studied parametric statistical models. At this point, we're going to make a turn to studying nonparametric regression models. We'll introduce the concept of a nonparametric regression model in this unit. We'll contrast this notion with the parametric models that we've studied so far and then we'll move on to studying a few important nonparametric regression models, one being kernel estimators and another being splines. Then finally, we'll introduce additive models and generalized additive models. We could think about some of those generalized additive models as a blending between parametric and non-parametric models. Let's start with an introduction to nonparametric regression modeling. The first video here is really just to compare and contrast nonparametric regression modeling with parametric modeling, and so far, everything that we've looked at has been parametric and I want to zero in on just exactly what that means. A statistical model is parametric if it is a family of probability distributions with a finite set of parameters. Think of an example. The normal linear regression model is a parametric model because it follows the following form. The response here, the y vector is normally distributed and it has a mean. The mean depends on some number of parameters. Typically, we've called it p plus one, where p is the number of predictors. That's our mean. It's a function of the predictors and of the parameters. Then it has a variance, covariance matrix. This here is a parameter, so this is a single parameter. The mean has p plus one parameters. Overall, we have p plus two parameters to estimate in this model. This model, this very compact form of our normal linear regression model, has a finite number of parameters, in particular p plus two. The generalized linear models that we've just studied, the Poisson and the Binomial are also examples of parametric models because we specified the form of the model. For example, a binomial response with a linear predictor and a logit link function. That form had finite many parameters. We might contrast that with what we call a non-parametric model, and a nonparametric model is a family of probability distributions with infinitely many parameters. That might sound a bit scary, but let's think about what that might mean in terms of an example. Suppose that we have something that looks similar to the last slide. We have a response and it has a normal distribution with some mean. Now I'm just calling it f of x_1 and some variance, covariance matrix. But let's suppose that this function is an arbitrary function. It's an arbitrary function where x_1 just lives on the interval, say negative one to one. This could be any function that has a domain from negative one to one. Since this function is arbitrary on this interval, no finite set of parameters could specify its form. We would need an infinite number of parameters to specify exactly what this function would be. The typical situation in statistical modeling is that, there's something that we don't know and that we're trying to estimate. If we really don't know the functional form at all, and we're trying to estimate it, we would need infinite data and that's just not what we have. This is an example of a nonparametric model. Another way to think about that is, generally, we can think about statistical modeling as trying to model the mean of the response in the following way. We have some mean. It's equal to the expected value of our response and that's equal to some function of our predictors. In normal linear regression, that function is a linear function, it's a linear combination of predictors and parameters. In a generalized linear model, we relaxed the second assumption and we allow the response to come from the exponential family. Often this change in the form of f. For example, we saw that for Poisson regression, our f turned into e raised to the linear predictor. In nonparametric regression instead of choosing f beforehand and then trying to estimate the parameters that show up in f. We will really try to learn f. We're trying to be much more flexible in what f is. We'll talk about what it means to learn f in a future lesson, but here I just want to think about very generally what this might mean. We learn f by assuming that it comes from some smooth family of functions. We're already putting some restrictions on f by assuming that it's smooth. That it has many higher-order derivatives. We're not choosing functions that are spiky in certain ways. The set of potential fits to the data is much larger than in the parametric approach. In the parametric approach, we might, for this plot here, choose a line that would be a pretty bad choice. If that were the case, we were just using normal linear regression, we would fix a line, and then fit the line, and then we would realize pretty quickly if we did some diagnostics that that fit would be bad. Then maybe we'd have to add some higher-order terms like squared or cubic terms, and that might do reasonably well. It might not for this particular data. But notice that we are fixing the form of the function beforehand, and then trying out and seeing if it works, and then changing the form, trying it out, seeing if it works. Now that might not be super-efficient. What might be more efficient in this case, if you didn't know how the data were generated, would be to try to leave open or pretty widely opened the class of functions that might fit this data and try to learn a particular function. Like the curve that I've fit here, try to learn that from the data. Try to pick up on the curvature that exists there. That's ultimately what a non-parametric model will do. In the next lesson we'll figure out, in what sense we can do that, what are the different options? But before we do that, let's just think about a few advantages and disadvantages of non-parametric regression. For advantages, the non-parametric approach is more flexible. When modeling new data, if you have a little past information, you're not really sure what the law like relationship is. If there's any, you don't know that it's linear. The non-parametric approach might be more efficient and it might help you learn that relationship. The non-parametric approach assumes far less about the form of the model so it's less liable to make major mistakes that result in bias. The parametric approach on the other hand, can result in bias if we choose the wrong form of the model. We've seen this a bit, but suppose that your data are actually quadratic and you choose to fit a line to that data, you will have bias in your model. That would be problematic and of course in higher dimensions, it's much harder to know whether your data were generated by some complicated nonlinear relationship. If you just fit a line, you have bias. If you use a non-parametric approach and you do it correctly, you could cut down on the amount of bias because you're learning about some of the structure there and you're not just assuming that it's for example, linear. Well, those sound like really promising features of a non-parametric model and we'll see how some of those play out soon. Of course, there are disadvantages. One is that while the parametric approach is more efficient, if the chosen model form is correct. If you know something about the model form, if you know that it's linear or you know what the curve structure is, then it's probably better to use the parametric approach. The non-parametric approach will be less efficient when the structure is available. Another disadvantage of the non-parametric approach is that it's more difficult to interpret. Remember, we've spent a decent amount of time thinking about easy interpretation of parameters for parametric models. The linear model, the Poisson regression model, the binomial model, all of those had nice interpretations for parameters, and that has consequences. It has consequences for coming up with explanations as to why changes in your predictors would result in changes in your response. It has ethical implications. Often, if you don't have a good reason why predictors and responses are related, if you can't explain it, you might use it to make predictions, but those predictions could be unjust, then that's problematic. With non-parametric models, we don't really have a formulaic way of describing the relationship between the predictors and the response. Often graphical approaches will be necessary, but we know that graphics can be different with high dimensional data so that becomes problematic. We'll see in a later video that a generalized additive model might be a nice balance between a parametric approach and a non-parametric approach. Because sometimes generalized additive models are called semi-parametric in certain forms because they can allow for some interpretation of parameters and not interpretations of others. We'll describe generalized additive models in more detail. But for right now, I just want us to think of one disadvantage of the non-parametric approaches, potential difficulties in interpreting the model.