From this lesson onwards, we extend our hypothesis test procedure to a situation where we wish to make comparisons between population means across two different populations, these are called differences in means test. In the following lesson, we'll begin by our hypothesis test involving a single population and use that to introduce the difference in means test in the subsequent lesson. In this lesson, we get to understand conceptually what a difference in means hypothesis test is. We will now extend the hypothesis test for a population mean to a situation where we are interested in comparing the population means across two different populations. These tests are termed as difference in means tests, and we will introduce them by way of an example. The file Athletes.xlsx contains data on a sample of Olympic athletes from some past Olympic games. In particular, it consists of the name of athlete, height and weight information, country of the athlete, and the athlete's gender. Let us take a look at this file, the very first column of the file is the name of the Olympic athlete, the height of the athlete in centimeters, the weight of the athlete in kilograms. Column D is the country that the athlete represents, and finally, column E is the gender of the athlete. An empirical study using data on heights of people claimed that the average height of men aged 18 years to 45 years across the world was 173 centimeter, which in inches is a little over 68 inches. This study included men, not only from the sports fraternity but across a wide spectrum of professions and walks of life. One could argue that men Olympians are likely to be taller than this claimed average height of 173 centimeter. Reasons could be many, for example, one reason could simply be a matter of self-selection, only the best and good performing athletes get into Olympics. And it is likely that the average heights are more than the average heights across the general population of men. One could set up a hypothesis to test the claim that the average height of men athletes at the Olympics is higher than the average height of men across all walks of life, which the empirical study found out to be 173 centimeter. We can use the sample data in our Excel file Athletes.xlsx to solve for this hypothesis test. That is, we will be testing whether we can reject or not reject the claim that average height of men Olympians is greater than 173 centimeter. Remember, we only have a sample data on heights of men athletes at Olympics. Based on this sample data, we wish to evaluate a claim being made about the average height across the entire population of men athletes at all Olympic Games. This is where hypothesis testing comes in, so step one of the hypothesis test, our null hypothesis, is that the average height of men athletes at the Olympics is greater than 173 centimeter. Note that it is not greater than or equal to, but rather, greater than 173 centimeter. That is mu of height, which is the population mean of heights across all men athletes at all Olympics is greater than 173 centimeter. Remember, the hypothesis is always for the population mean, because that is the quantity that we do not know, and we are trying to test some claims about that using our sample information. The alternate hypothesis is that the population average height across men Olympians is less than, equal to 173 centimeter. Notice that we end up having a strict inequality in the null hypothesis, which, based on our guidelines given in an earlier lesson, is not admissible. So we simply flip the null and alternate hypothesis, thus a new null hypothesis is that the population mean height of men athletes at the Olympics is less than, equal to 173 centimeter. The alternate hypothesis is that the population mean height of these athletes is greater than 173 centimeter. This is the new null and alternate hypothesis that we will be testing. Lest there is confusion, we should erase the earlier null and alternate hypothesis. This is a one-tail test with rejection region on the right-hand side. Remember, the tip that sends the mouth of the inequality, the null hypothesis, opens on the right-hand side, hence the rejection region is under right-hand side. Once the problem has been translated onto a t-distribution, also absorb that there is a single population mean we are interested in, the population mean height of men athletes in the Olympics. We are not yet introduced the concept of difference in means, we will do that after we solve for this hypothesis test. Step two of the hypothesis test is to calculate the t-statistic. By doing this, we translate the problem onto a t-distribution, a necessary step to proceed with the hypothesis test. The t-statistic is given by x-bar minus mu of height divided by s by square root of n, where x-bar is a sample mean, mu of height is the claimed population mean around which we are doing the hypothesis test. s is the sample standard deviation, and n is the sample size, let us calculate this in Excel. To calculate the t-statistic, we need to calculate x-bar and s. x-bar is the sample average height of all the male athletes, and s is the sample standard deviation of the heights of all the male athletes. However, notice in this file, we have data on both male and female Olympic athletes, so we need to first segregate the data. One way could be to select the data and sort the data based on gender, sort by gender. So now what has happened is all the male athletes' data have been put at one place, so now you could appropriately calculate x-bar and s by selecting data only for the male athletes. Another approach could be to use a pivot table and then calculate the averages. However, a wrong approach would be to filter the data without creating a pivot table. So if I filter the data, and I select only the male gender. Now, though in my file, I see only the male athletes, however, when I calculate the sample average, the sample standard deviation, I would be getting a wrong answer. So we'll not be following this filtering approach to calculate the t-statistic, so I'll remove the filter. Rather, what we'll do is we'll follow the pivot table approach. To insert the pivot table, we go to Insert > PivotTable, I select my data range, my entire data range, And I'll create a pivot table in a new worksheet rather than putting it in the existing worksheet, okay? So a pivot table template is produced, let me increase the font size here, the resolution. So now I'll take the name of the athlete, drag it where it says Row Labels. So I have all my athletes in column A. Then I'll take the height information and drag it where it says values, so I have the athletes and their corresponding heights. Note that the height is given in terms of Sum of Heights. In this case, it really doesn't matter because there's only one height per athlete, nevertheless, let us change it to Average of Heights. So I'll go to Value Field Settings, change summarize value field by average, doing okay. So I have my athlete and the height of that athlete. Next, I need to segregate these athletes based on gender, I only need the male athletes. So I can take the gender, and I can filter the report based on gender, so I drop it where it says Report Filter. So now I have a Gender, Tab inserted in the very first row, and I can select only the male gender. Doing okay? So now, I have information only on the male athletes. And notice, unlike filtering without using a pivot table, in this case, when I calculate my sample averages and the sample standard deviation, I would get a correct answer. So now, let me calculate my x-bar. I'll also calculate my s. I'll also calculate my n, the number of male athletes, so x-bar is given by average. I select this column. So that's my sample average height, 183.83 centimeters, my sample standard deviation, stdev.s(. That's my sample standard deviation, the number of male athletes, I can do a count here, count 770 on male athletes, and now I can calculate my t-statistic. So t-statistic is calculated as x-bar minus mu. So I pick up x-bar minus mu, the value of mu is 173 centimeter because that is the population mean we are testing against divided by s by square root of n. So I pick up my s, which is a sample standard deviation, divided by square root of sample size, sqrt. I pick up my sample size, which is the number of male athletes in the data. Close parenthesis, one more close parenthesis, so that is my t-statistic. The t-statistic turns out to be 30.0351, step three of the hypothesis test is to calculate the current value for the t-statistic. This is a single-tail test, and there will be a single rejection region on the right-hand side. For a one-tail test with rejection region on the right-hand side, the cutoff value is calculated as positive absolute value of T.INV alpha degrees of freedom, which is n minus 1. Since no mention of a particular value for alpha is made, hence we will choose alpha to be the industry standard value 0.05, so t-cutoff, Is calculated as is equal to a positive absolute value of T.INV. So abs(T.INV, so abs is the function for calculating absolute values, alpha, and since no value for alpha is given, we choose 0.05, comma degrees of freedom, which is n-1. So we pick up n, a sample size from here, 771 minus 1, close parenthesis, and one more closed parenthesis to match it with all the open parenthesis, so that is the value of t-cutoff. Given this cutoff value of +1.6468, the rejection region for the t-statistic is all the area to the right of +1.6468. Step four of the hypothesis test is to check whether the t-statistic falls in any rejection region. The t-statistic does fall in rejection region on the right-hand side, thus we reject the null hypothesis. What does rejecting the null hypothesis imply? It implies that we reject null hypothesis, that the population mean height of men athletes at the Olympics is 173 centimeter or less. And since we reject the null hypothesis, we automatically do not reject the alternate hypothesis, which states that the population mean height of such athletes is greater than 173 centimeter. Thus, our final conclusion is that based on our data evidence, we cannot reject the claim that male athletes at Olympics are taller than 173 centimeter. So that was a hypothesis test involving a single population, the population of men Olympians. We had a sample from the population in terms of our sample data from some past Olympic Games, and we end up not rejecting that the population average height of men athletes is greater than 173 centimeter. Now, let us extend this test further, the empirical study we referred to at the beginning of this lesson found that the average height of men 18 to 45 years old from various walks of life across the globe was 173 centimeter. We then tested this claim in our data of Olympic athletes and found that the population mean height of men Olympic athletes was greater than 173 centimeter. That is, we rejected the claim of average height being equal to 173 centimeters. This empirical study also claimed that the difference in average heights across men and women from various walks of life around the world was 12.5 centimeters. We can test this claim using a sample data and see whether this claim holds for the population of men and women athletes at the Olympics. Note an important difference here, the claim involves the difference between two population means, the population mean height of men athletes at the Olympics and the population mean height of women athletes at the Olympics. This important difference slightly changes the way this test is carried out, and this test is now known as the hypothesis test for difference in means because we are interested in testing claims for the difference in two population means. In this case, the difference between the population mean height of men athletes and the women athletes at Olympics. We will continue with this difference in means test in the next lesson.