We're going to look now at Runs Values in more detail and break them down into particular categories in order to see what statistics we've got. We're going to do that by generating the run expectancy matrix again. We're going to do it for more than one season and use that to compare performance across seasons. We'll start off as usual with just loading the packages that we need to run our statistical analysis. Then the next thing we're going to do is really create a function that will just generate the run expectancy matrix for any given season. This function is written down here. All functions start with instruction D-E-F DEF, and we call this run expectancy. Then the function then has a list of line commands which it has to run in order to complete the process. Now this might look complicated, but all that's happened here is that each of the line commands that we used in the previous week to generate the run expectancy matrix. They've just been copied one underneath each other to produce one long command. You could go back to last week's session and just copy each of those lines yourself in order to reproduce. Especially in fact, maybe if you're not sure, maybe you should just do that for yourself just to convince yourself that that's what you want, but that's all that's been done here. What I've done, is just copy those lines, cut and paste those lines from last week's notebook into this one in order to generate this function. All of those steps are here. Then of course, the last line is just return and then that just produces RE, the run expectancy matrix. The other point, remember, is right at the beginning, is we have to tell it where to find the data and that's what is referred to as the path here. When we write down the function, we have to specify the path. If we run this function, it will just load that up. Now it's ready to run that function anytime we specify a path where we want to generate the run expectancy matrix. For example, if we want to generate the run expectancy matrix for 2017, we can just tell it to run the function run expectancy here. Inside the parentheses we have the path so where it's going to find the data. Then we just ask it to display the run expectancy matrix when it's finished. If we run that, it will just produce it automatically. There we have the run expectancy matrix for 2017. We can go on repeating this for as many data sets as we have and we can repeat it for example, for the 2016 seasons. Here we have run expectancy 16, which we again, we tell it to run the run expectancy function, and to define the path where we will find the data, and display that data, and we run that. Now, again we'll get the run expectancy matrix for 2016. Just to convince yourself that this works, you might then now as a self-test, just run this for 2015 and generate the run expectancy matrix for 2015. This is good illustration of the power of something like Python to generate these statistics of interests very easily and quickly once you've established the base pattern of the data. We're now going to go through a series of exercises looking at different ways we can use this data, different things we can learn from the data. The first one we're going to do is to just look at the event Run Values themselves. Each event, each possible event in the game has a Runs Value. We can identify what that Runs Value was in each the seasons because the run expectancy matrix is calculated season by season. Here we say, for example, in 2016, let's look at the Runs Value for each event. We're going to do this by a group by function. We're going to group by events the Runs Value and calculate the mean. What's the average Run Value for event? We'll generate a column with the Run Value in 2016. If we run this, now, you can see here we have each possible event. We have the Runs Value in 2016, so we have quite a few events. We have 31 different events here in our list. You can see here what their Runs Value was. For example, you can see that near the bottom there, I wrote 25, a single had a Run Value of 0.44, or in row 16, a Home Run had a value of 1.38, and so on. Each of these postulates, of course, some of these events are far more common than others. Batter Interference or Fan Interference are somewhat rarer events. They do happen. We have the Runs Value for each possible one, of course some are positive and some are negative depending on the context in which they occur. We can now then compare that with the Runs Value of the events in 2017. This is the same function, but now just applying it to the run expectancy matrix for 2017 rather than 2016. Again, we calculate the Runs Value. This output looks very similar to the one previously, we get a whole series of numbers. Of course what we want to do is compare the Runs Value in the two seasons to see whether Runs Values of different events are relatively stable from year to year. We just do a merge here of these two seasons to produce the comparison. You can see here now, when we do the merge, you can see the Runs Value of these different events in each season. If you look at the first row, for example, batter interference, you can see there's actually quite a big difference between the Runs Value, a battery interference in 2016 and 2017. Then on the other hand, this is a relatively rare event. Bunt Groundout the second row has roughly the same value. Bunt Lineout, somewhat, slightly larger in gap. If we go down to start looking at things which are much more common, like a Double. Double has, again, the value is very close. If we scroll down and look at some of the others. For example, if we look at the value of a Home Run in row 16 there, you can see that it's almost identical, 1.382 versus 1.378. Really very close. Likewise, if we go down to say, a Single in row 24, you can see a Single in 2016 had a value of 0.44 and Runs Value 0.44 and in 2017 a Runs Value of 0.4556, which is really very close. Just to illustrate how stable these are from season to season, we can run a correlation test. Let's look at the correlation between the Runs Value in 2016 and 2017. That's this simple row here. You can see here the correlation coefficient is 0.994, almost 0.995. Really this shows how stable the Runs Values are from year to year. That essentially the game itself is relatively stable from season to season in sense that the same kinds of actions produce roughly the same kinds of outcomes. The identity of the players might change, but the game remains essentially the same. That concludes looking at the event Run Values. Now let's move on and we're now going to look at the Runs Values for some of the players.