Now, we've been using regression as a model fitting technique throughout this course. I think one of the objectives of any design experiment is to end up in a situation where you could use the data from that experiment to fit an empirical model that gives you a good quantitative explanation for what you've learned from the experiment. So you've seen a lot of regression modeling in earlier classes. But in this module, we're going to go into this topic in a bit more detail and really give you the fundamental background of regression modeling techniques. We're going to talk just about everything in this chapter. The idea of fitting linear regression models, doing the inference on the models, that is hypothesis testing and confidence intervals. Then model diagnostics to, how do we know for sure the model is is a good fit to the data. Let's introduce the idea in a simple way. There are a lot of fields where we have two or more variables that are related in some way. We don't necessarily have a theoretical model that relates them, but we would like to find an empirical model that we can use to explore this relationship. For example, chemical process yield may be related to temperature. There may not be a really good physical or mechanistic model that really describes the situation. But we have data and our chemical engineer might want to build a model relating yield to temperature and then use that model perhaps for making predictions or perhaps for controlling the process, or maybe even for optimization purposes if he can control the temperature. In general, in regression problems, we think of having a single dependent variable y, which we usually call the response variable. This variable depends on k independent variables or regressor or predictor variables. They are usually represented by xs, x_1, x_2 on up to x_k. We have some sort of mathematical model that we propose to relate the xs to y. This is called a regression model. Then we have data that we have available. Then we use that data to fit this regression model to the data. In some cases, we have a pretty good idea what that relationship should be. But in most real cases of regression, the true functional relationship is really unknown. So we use some sort of approximating function instead of the true function. Low order polynomials are very widely used to do this. Regression models are empirical models. They're not mechanistic models or not things like Ohm's law when there's an underlying physical theory, they're empirical. We often use regression as a model building technique on an unplanned or undesigned data. But regression is also used extensively to build models to data from design experiments. We've seen plenty of examples of that in previous modules in the course. So we're going to focus initially on fitting linear regression models. Here's a hypothetical example. Suppose we want to develop an empirical model that relates viscosity of a polymer to the temperature and the catalyst feed rate. So viscosity is y and temperature and catalysts feed rate or x_1 and x_2 respectively. So Equation 10.1 is a linear regression model that relates these variables. y is equal to Beta_0 and the intercept plus Beta_1 times x_1 plus Beta_2, x_2 plus an error term, y is the viscosity, x_1 is a temperature, x_2 is a catalyst feed rate. This is an example of a multiple linear regression model with two predictors or two regressors. The reason this is called a linear regression model is because it's a linear function of the unknown parameters Beta_0, Beta_1, and Beta_2. This model describes a plane in the two-dimensional x_1, x_2 space. Very often these coefficients Beta_1 and Beta_2 are called partial regression coefficients because, for example Beta_1, expresses the expected change in the mean of y per unit change in x_1 when x_2 is held constant, and Beta_2 measures the expected change in y per unit change in x_2 when x_1 is held constant. In general, there may be k regressors. So that would lead to a multiple linear regression model with k predictors, such as you see in equation 10.2. The parameters here, the Betas are often simply called regression coefficients. The general case equation tend to describes a hyperplane in a k-dimensional space. Models that are more complex than the ones we've just looked at could still be analyzed by multiple regression methods. For example, suppose we add an interaction term to that two-variable multiple regression model. So now you've got a term Beta_1, 2, x_1, x_2. That's an interaction term. It turns out we still have a linear regression model. One way you can see that would be simply to let a new variable x_3 be equal to the product of x_1 and x_2. Then another coefficient, Beta_3 equal Beta_1, 2, and then you could write 10.3 in the form of 10.4, and it's a standard multiple regression model with three variables. We've seen interaction terms added to regression models in some of the examples in earlier chapters. But in earlier modules, particularly Chapter 6, 7, 8, where we looked at regression modeling to capture the results from a two level factorial design. 10.5 is another example. Now, in equation 10.5, we have the two linear terms and we have two squared terms and interaction terms. So this is a second order response surface model. Again, this is still a model that is linear in the unknown Betas. So we can fit this model using standard multiple and in regression methods. How do we do that? Well, let's talk about the methods that we use to do that. The model fitting technique that we use is called the method of least squares. We're going to assume that the error term in our model has expected value zero and constant variance Sigma square, and that the arrows, the epsilons, are own correlated. The regression model, or rather the data for our regression model is in Table 10.1. This is the general case where we have k regressors or k predictors in our observations. N by the way, has to be greater than k. You notice that the xs have two subscripts. The first subscript tells you the observation number and the second subscript identifies which variable it is. We could write a regression model in terms of these observations. If you look at Equation 10.7, Equation 10.7 is a general regression model written in terms of the data that you see in the table. Y_i is equal to Beta_0 plus Beta_1 times x_i1 plus Beta_2 times x_i2 plus all the way on out to Beta_k times x_ik plus Epsilon_i. That's the ith observation, that's the ith row in that table. The method of least squares chooses the Betas in this equation so that the sum of the squares of the errors is minimized. Equation 10.8 was found by summing up the arrows, the squares of the errors, the Epsilons, and then solving for Epsilon and substituting that into the equation for l. How do you go about minimizing l? Take the derivatives of l with respect to the model parameters and set to zero. When you do that, you get a set of equations that look like Equation 10.10. This is a set of p equal to k plus 1 equations in p equal to k plus 1 unknowns, the Betas. These are called the least squares normal equations. There is one equation for each one of your model regression coefficients. The solutions to the set equations will be the least squares estimates of your model coefficients. Usually we denote the estimates of the model coefficients by putting a little hat over the Betas. It's a whole lot simpler to do this and easier to manipulate things. If we write the model and the normal equations in matrix notation, and this is the standard way that we always do this. We can write the model that was presented earlier in terms of the Observations 10.7. We can write that in a compact matrix notation as y equal to x Beta plus Epsilon. Where y is an n by one vector of the observations, x is an n by p matrix that represents the levels of the independent variables, Beta is a p by one vector of your model coefficients, and Epsilon is an n by one vector of arrows. If you look at the matrix x, you'll notice that this portion of the matrix x is just the data table for your regression data set. We add this column of ones on the left-hand side to account for the intercept term that's going to be in the model. The least squares function L can be written as Epsilon prime Epsilon. That's simply the sum of the squares of the arrows. If we solve for Epsilon, that's y minus x Beta. So we substitute that in to L and we get the expression for L that you see at the top of the slide. Let's multiply it out. let's expand that. When we expand that, we get the first line of Equation 10.11. Now, if you look at the first line of Equation 10.11, notice that every single term in that is a scalar quantity because it starts with a row vector, a primed vector, and ends with a column vector. That makes sense because the least squares function is a scalar quantity. Now, look at these two. Notice that this one is just the transpose of that one. So they're the same scalar. I can write y prime x Beta as beta prime x prime y, and these two terms become the same and so that gives me the second line of equation L. I now want to take the derivative of that with respect to vector Beta. This is what the derivative looks like. Notice I've now substituted beta-hat in for beta. When we equate that to zero and simplify it, we get x prime x times Beta-hat is equal to x prime y. That is the matrix form of the least squares normal equations. If you were to multiply this out, you would get Equation 10.10. To solve the normal equations, all we have to do is get the inverse of x prime x, and then we can multiply both sides of 10.12 by that inverse and beta hat then becomes x prime x inverse times x prime y. We're assuming, of course here, that the x prime x inverse matrix exists, and in all the problems that we're going to be talking about, that's not going to be an issue. Your fitted regression model is simply y-hat equal to x Beta. That is, we can take each observation, which was a row of the x matrix, multiply by beta-hat and that will give us the corresponding fitted or predicted value for y. In scalar notation, y-hat sub i would just be beta-hat zero plus the sum of each Beta_j times the corresponding x_j for the ith observation. The difference between the actual observations and the corresponding fitted value is a residual. We can write that residual in vector form as vector e equal to vector of observations y minus vector of predicted values y-hat. It's usually necessary in fitting a linear regression model to get an estimate of Sigma square. It's pretty easy to do that. Consider the sum of squares of the residuals, what I'm going to call SSE, the error sum of squares. The error sum of squares in matrix form would just be e-prime e. Now, remember e is just y minus x Beta-hat. Substituting y minus y-hat or substituting for y-hat int y minus y-hat, we get an expression for the error sum of squares that looks like this. We can expand that just as we did previously when we had the least squares function, and we find that these two middle terms are the same so we can combine them. Because x prime x times Beta-hat is equal to x prime y, this term becomes the same as one of these terms, and so that last expression reduces to SSE equal to y prime y minus Beta-hat prime times x prime y. This is called the error or the residual sum of squares and it has exactly n minus p degrees of freedom. You can show that the expected value of that error sum of squares is Sigma squared, the variance of the errors times n minus p. So an unbiased estimate of Sigma squared would be found by dividing the error sum of squares, something we can easily compute by the number of degrees of freedom, n minus p. This expression that is used to estimate Sigma squared is sometimes called the mean square for error in regression.