In this class, we're going to talk about a special case of the two sample t-test, which leads to something called the paired t-test. And it's all about something called a paired experiment. And a lot of simple comparative experiments where we have a single factor with two levels. We can greatly improve the precision of our test by making our comparisons within matched pairs of experimental material. And let me give you a simple example of this. Suppose we have a machine that tests hardness like a rockwell hardness tester. The way this machine works is it presses the rod with a pointed tip into a metal specimen with a known force. And that by measuring the basically the size of the depression caused by the tip, you can determine the relative hardness of the of the specimen material. Now, this machine has two different tips. Why are they different? I'm not sure, maybe one of them is with the original equipment manufacturers supplying for parts list. And the other is another one that came from a different vendor. The precision of these two tips seems to be pretty similar but it's suspected that one tip produces different mean hardness readings than the other. So how could we investigate this experimentally? Well, one way to do this would be to select a number of metal specimens at random, let's say 20. And then test half of them with tip 1 and the other half by tip 2. And then you could randomly assign specimens to tips, so that you could view this as a completely randomized experiment. Then the average hardness could be compared using say one of the two sample t-test that we talked about before. Either the pool t-test or the or the two-sample t-test with unequal variances. Now, if you think about this a little bit there's some issues here, suppose these metal specimens were cut from different pieces of bar stock. And that those different bar stock samples were produced in different heats. Or maybe they're not exactly homogeneous in some other way that might potentially affect hardness. So this lack of homogeneity between the specimens actually contributes to the variability of the hardness measurements and intends to inflate that variability and it makes the experimental error larger. And this could perhaps make detecting a true difference in the tips not as easy. So to protect against this let's consider an alternative design. Suppose each specimen is large enough so that you could make two hardness determinations on each one. So our alternative design would consist of dividing each specimen into two parts. And then randomly assigning one tip to one part of the sample and the other tip to the remaining part of the sample. Now, the order in which the tips are tested would be randomly selected and which portion would be randomly selected as well, so that if we conduct this experiment with a total of 10 specimens, we get the coded data that you see in this table, Table 2.6. So these are the say Rockwell Hardness measurements. Let's write down a statistical model for the data from this experiment. Now, this is this is a very common thing for us to do an experimental design work, is to propose a statistical model for the experiment that we've run. Y, i, j that is the measurement for tip eye on special and j. And it's equal to an overall mean mu sub i, which is the mean for that particular tip. Plus beta sub j, which is the specimen effect, okay? Or the coupon effect, plus Epsilon ij, which is random error. And we're going to assume that that the error term is normal and the usual sorts of assumptions about independence and 0 mean. So how would we go about analyzing the data from this experiment? Well, I want you to consider the difference in averages for the j's pair of observations. D sub j would be y1j minus y2j. Take the expected value of that difference. It's pretty simple to show that the expected value of that difference reduces to mu1 minus mu2, just the difference in the two means. That is the specimen effect beta sub j cancels out. And it's the pairing of the observations on the on the specimens been causes that to happen. So pairing basically removes the specimen effect from the difference in means. So testing that the means are equal now is really equivalent to testing that the mean of the differences is equal to 0 against the alternative that says it's not. That's a single sample t-test. So the appropriate test statistic would be equation 2.41, d bar is the sample average of the differences, s sub d is the standard deviation of the differences and n is the number of pairs. So here are the equations for calculating d bar. That's just the sample average, and below that is the equation for calculating the standard deviation of the differences. And that's the straightforward application of the standard deviation formula as well. So mu sub d equals 0, that null hypothesis would be rejected if the absolute value of t0 exceeds the upper alpha over 2 percentage point of t with n minus 1 degrees of freedom, yn minus 1 degrees of freedom, because you had n specimens in pairs. You can also use a p-value approach. And as I said at the beginning this procedure is usually called the paired t-test. Here's the paired data and the differences. I just simply calculated the differences from table 2.6. And then the average difference is minus 0.10. And the standard deviation of the differences is turns out to be 1.20. So we're going to choose Aapha of o5, and that means that the critical value would be the upper 2.5% point of t with 9 degrees of freedom. Why the 2.5% percent point with Alpha of o5 because it's a two-sided test. So the critical region is bounded at 2.262. So let's let's compute the paired t-test statistic. If we plug the numbers into the test statistic that I showed you a moment ago. The computed value of t0 is minus 0.26 and because the absolute value of that 2.26 is not greater than the critical value 2.262, we cannot reject this null hypothesis. That is there is no evidence to indicate that the two tips produce different hardness readings. There's another way to visualize this. Compute the pooled standard deviation for the two samples. That is just take those two sets of data and compute the pooled standard deviation or pooled variance. You can show that the pooled variance its expected value. That is the expected value of s squared sub p is equal to sigma square plus another term, which is the sum of the squares of the block effects, the beta j is the coupon to fix. Now, what does that tell you? That tell you that those differences in the blocks are the specimens inflate your error variance? That's why we call blocking a noise reduction experimental design technique. It removes the variability from the blocks in this case the coupons and takes it out of your variance estimate. Another way to see this is in terms of a confidence interval. Let's use the paired data and find a confidence interval on mu1 minus mu2, that's the same as a confidence interval on on mu sub d. Well, the 95% confidence interval is shown in about the middle of the slide here and it's minus 0.10 plus or minus 0.86. In other words the width of the interval is 0.86. On the other hand, if you use the pooled analysis and assume that these are two independent samples look at your 95% confidence interval now. The midpoint is still minus 0.1. But look at the length of the interval, it's it's plus or minus 2.18. It's considerably longer, 2.5 times longer than the length of the paired interval, why? Because the block effects are large, the specimen effects are large. And those specimen or coupon or block effects are inflating this estimate s square sub p that's used in the confidence interval that you see at the bottom of the slide. So pairing is a design strategy that can be very, very useful when looking at simple comparative experiments. That's it for Module 2. Thanks for listening. And in our next set of classes, we're going to move on to Module 3, which is all about single factor experiments with more than two levels. Thanks a lot.