So it probably won't come as a surprise that we're going to round off this analysis of run expectancy data by looking at salaries, which has been the theme throughout all of this book, based on the moneyball story about undervalued resources. So the question we're going to look at here is whether there is a correlation with the run values, win percentages, and the salaries paid to the players, so extending the question really to us, to run values. Are they reflected in salaries? Are salaries is in any way predictive of the runs values and win percentages in the future? In order to look at the salaries, we need to obtain the salary data. We can read and we got the salary data which comes from the Lahman database here. So if we just read in a data, you can see here we've already loaded up the win percentages for 2016, 2017 and the salaries paid out by each team in the season, in the 2016 season. So we can use this data to look at performances and make these comparisons. We want to merge this data with the runs value data, so we take the runs value data that we already created, and merge that with this salary and win percentage data and when we merge, use the merge function to do that, we can see here we've got a list of all of the teams. We've got all of these variables. Win percentage in 2016, and 2017, runs value in 2016, 2017, and then the salary level in 2016. We can now run some scatter plots as we've done earlier on. So if we just run a scatter plot like this, for example, we have here the regression of runs value in 2016, run on the win percentage in 2016. Effectively, what was the correlation between these two variables? You can see here, there's a reasonable fit between the two. We'll look at that more precise in terms of correlation coefficients in a minute. But you can see that there's some reasonable correlation between runs value in a season and win percentages season. So runs value really does measure something about the capacity to win games and achieve win percentage. Let's now look at the relationship then between the salary level in 2016 and the runs value in 2016 and see what that looks like. Again, there's a correlation, but you can see here it's a little bit weaker, and you can see, for example, by the slope of the line, if you compare the two charts there, you can see that the salary runs value regression line is somewhat shallower, slightly flatter than the runs value win percentage regression line. That reflects a slightly lower correlation, slightly lower capacity to explain the data. But not so very different in fact, so roughly similar. Now let's look at runs value in 2016 and win percentage in the following season, 2017. So to what extent might we think that runs value in one season is predictive of success in the following season for a team? We can see here, again, there is some correlation. Again, it's not perfect by any means, but there is definitely a positive correlation between runs value in the previous season and win percentage in the following season. So teams with a high runs value seem able to convert that into success in the following season, so in some sense, they're managing to retain some stability in performance from one season to the next. Then finally, if we look at the relationship between the salary in 2016 and the runs value in 2017, to what extent does salary in some sense predict the runs value for the following season? If we look at the regression line there and we can see here there's almost no relationship at all. The regression line is almost horizontal, so that there's very little predictive value in the salaries in the previous year to the runs value in the following year. If we look at the correlations, we can generate a correlogram for all of these variables. We can see the following relationships. Again, note that the correlation between runs value in one season to the next is modest, but's there the correlation between Runs value in a season and Win percentage in that season is there. In fact, in 2016, you can see the correlation looking at the top row between Run value in 2016 and Win percentage in 2016 was [inaudible] roughly and that's a reasonable level of correlation. But if you look up the second row, the correlation between Runs value in 2017 and Win percentage in 2017, it's pretty high. In fact, it's 0.72. In that sense Runs Value was a fairly accurate measure of team success as measured by Win percentage and you can also see that the correlation between [inaudible] and salaries in 2016 is quite solid. This 0.45, there's a fairly good correlation there. But then if you look at the next row and the last column, the correlation between Runs Value in 2017 and salary in 2016 is really quite small, 1.1112 which is why that regression line looks flat but there is very little correlation between those two. This gives us some basis for thinking about the relationship between these statistics. We might ask the question, why is it that salaries are not a good predictor of success and there are a number of reasons that might come to mind in terms of this. One explanation is that salaries are not that correlated with productivity because there isn't a perfectly competitive market for players in baseball. We discussed before how rookies and arbitration-eligible players have limited bargaining power and therefore, that might mean that salaries are not reflective of expected ability. A second possibility is that salaries are more backward-looking. That should in some sense be a surprise. You'd think that salaries would reforward expected future performance rather than simply rewarding past performance. But that also is a function of how you think that markets work and perhaps you might disagree on that point and you might think that it actually makes more sense that they're rewarding past performance and also of course, in this context, it's obvious that the data is by no means complete. The salary data is not complete and it's sometimes hard to match up the exact salaries that are paid to the players with the year in which their performance is being measured. It might just be statistical noise in the relationship which is producing this very low correlation. Nonetheless, I think we've shown here that at least Runs Value is a pretty good measure of player productivity and helps us to understand more about the way players' performance enable us to have a better idea of how the game runs and one can go on from this to generate more complex verse statistics about individual players or about team performance and that's a lot of what is going on in the baseball world today is looking at more and more. These measures [inaudible] more and more detail in order to see where competitive about it lies, to see where opportunities exist that perhaps haven't been identified before. If you're interested in producing your own value of Wins above replacement, then you can certainly do that on the basis of what we've looked at here. If you want more guidance on how to do that, then that's actually contained in a good paper by Baum [inaudible] and Matthews could open war which you can download for free off the web and shows you ways in which you can make adjustments but also you can move on to generate your own adjustments based on the data that we've looked at in order to produce your own proprietary version analysis and statistics for player and team performance and that's been the theme really across this entire [inaudible] is what we've been trying to show you is how you can use the data to generate your own statistical analysis. We've shown you some of the basic introductory ways of manipulating the data. But hopefully now equipped with this, you go on and produce your own original statistics which will produce, with any luck, another Moneyball moment.