For this segment, we're going to be looking at the different kinds of descriptive statistics. Particularly, looking at what are called measures of central tendency which are the mean, and the median, and the mode. Let me describe what each one of those terms means. The mean is the arithmetic average of a bunch of values and so if we have a list of values, let's say the average score on a GMAT or some other standardized test right. So, then what we have is we have individual scores of our participants would make up the list and then to take the average of the scores the arithmetic average, that's what we call the mean. The median would be if we look at all the scores and the list that I just mentioned, and we rank them from the highest to the lowest or from the lowest to the highest and we take the score that's in the absolute middle of all of the ranked scores that would be called the median. Now, the mode is the score in that list that occurs with the highest frequency. Now, each one of these measures of central tendency are valuable in describing a population; what's going on in that population, and what are the essentially the tendencies of that population. So, if we're talking about a test score, we can look at a population and then identify well this population has a higher test score than some other population. Each one of these different measures may be used in a slightly different setting because it's slightly more accurate with respect to what we're trying to uncover about that population. Let's look at some examples and let's look at how we actually calculate these things. I'm going to be using Excel and I'll be slowing down at various points and allowing you to see some of the formulas I'm typing in but they're pretty straightforward. Let's get started. Let's start with the mean. Here I have a diagram that shows the difference between a population and a sample. Now, this becomes slightly important because depending on whether you're looking at the population, all of the observations, or a sample, some subset of those observations, the calculation for the mean is going to be slightly different. In this example, I have a bunch of observations that are shown with these different stars. Let's suppose that I go in and I take a lasso around these and I move just some of them and say, let's evaluate just a portion of our entire population, that's what we would call a sample. We might observe the difference between the sample and the mean population because it would be described slightly different and so this x bar is usually describing a sample where this mu this Greek letter mu is describing the entire population. This is how you normally read the mean for a sample. Here we say the mean or x bar is equal to the summation as the Greek Sigma of a bunch of observable values, X's. So, if we have let's say 20 observable values in our sample, then we would say, let's look at the value for the first, and the second, and the third, and the fourth, and the fifth, all the way up to 20. So, in this example, n would be 20. We would then add up all of those values and divide by 20, and that would give us our mean. Let's do an example from data. Here on this Excel spreadsheet, we have from the 2016 Major League Baseball season. The time, the duration of each baseball game that the Chicago Cubs played and we also have the attendance, the number of people who were in the Wrigley Field for each game that Chicago Cubs played during the 2016 season. Now, I downloaded these from Major League Baseball website and so here's we have this time, here's the duration of the game, and here we have the attendance, the number of people who were actually at Wrigley Field. Let's suppose that we wanted to ask the question of what was the average duration of the baseball game? Well here, I'm going to type in equals, average and I'm going to highlight the entire segment. All of my observations right here and this will tell me what the average is, and so this says that the average baseball game lasted three hours and four minutes. That's a population and so we're summing each of the values one through 162 because there's a 162 games and adding up the time of each one of those games and then dividing by a 162 to give us the average baseball game. Let's suppose that we wanted to find the average of sample of baseball games. Right here, this is the last one, two, three, four, five, six, seven games, what have you. So, let's suppose if I want to do that, I'd type in average and going to highlight just these games right here and there we go. The average of these games is four hours and 39 minutes. Now, it's a little unfair here to start with because before I put this data in here, I ranked this data from the shortest game which was a rain delayed game that only lasted a few innings to the longest game. So, these data aren't arranged from the first game to the second game to the third game, they've already been ranked from top to bottom. So, if I was going to take a sample of this just from the last couple of games, clearly my sample mean here is going to be different than let's say my mean for the entire population. We want to be careful when we're drawing a sample and looking at the sample mean such that we identify that we're taking a random sample of the population. In this case, if I rank all the games from the shortest game to the longest game and then just draw a sample from the last couple games which are the longest games, my sample mean is going to be significantly different or dramatically different than my population mean. Here is a different list and I've got the games here ranked from the first game to the second game to the third game and what have you. If I scroll down here, I'm going to take the average of all of these games again and you'll notice equals average. With the same calculation here, the same set of keystrokes, I'm getting the same population mean, three hours and four minutes. These games are from the first game to the second game all the way through the 162 game. Let's suppose that I take a random sample here and since the games are in any particular order other than when they were played during the season, let's suppose I just take 10 games here from the middle of the season. Let's say games 100 through 110, I'm going to do with the sample population and my sample mean and so this is the average of games 100 to 110. You'll notice here in this case, the sample average is very close to the population average. The difference between these two calculations in these two examples that I showed you, these are highlights how you actually calculate the average using Excel. So, typing in equals average and then highlighting the observations that you're looking at and if highlight all of the observations for entire population, you will get the population mean or the population average. If you only grab a couple of this observations or highlight subset, you'll get the sample mean. In the first example when I had ranked them, we grabbed part of a sample and it was a bias draw from our population and so our sample mean was different than our population mean. In this example, much less bias draw, I just grab couple observations in the middle here. I'm going to do one more. Let's draw 10 games at the end of the season. So, let's say, 100 through games 160. So, this is now we've got an average game time about two minutes and 57 seconds. Let's do one more. Here's some games, let's say 40 through 50. All right. Again, this one is also two minutes and 57 seconds. So, we're not getting dramatically different sample means based on when we're drawing our samples and so if I just grabbed a random 10 observations here, I can calculate the sample mean as well.