In this video, we're going to go ahead and continue our run value analysis by looking at changes in run values for players from year to year and also for team from year to year. So start out with, we are going to go through a similar process as we did with our event run values. We can see we have a Group BY statement here and we're creating this Data Frame player_ value 16. Here we have a Group By statement in which we're grouping batter Id and batter Name. You can see we follow that with a Run_Value.sum. So we're summing the Run Value by batter Id and batter Name for our RE_16 data. And once again, we're taking the sum of player Run values rather than the mean as we did for event run value's. Because we're looking to see the total contribution of a player's run value to his team in a given season. So that's why we have a sum there. And then we're going to reset our index and rename our column from Run value to be RV16. We can go ahead and run that line. Now we have player Run values for 2016. And we're going to go through the same type of process for our RE 17 data. We're going to create this Data Frame, Player_value17. Once again, we're going to group by batter Id. And we're going to do Run_value.sum. So we're summing the Run values by batter Id in RE 17 data. Note that we don't actually have to include batter Name in this data-set. The reason for that is in the next line, we're going to merge together Player_value16 and Player_value17 data-sets. Since Player_value16 already includes the player name, that player name variable is going to be included in our merged data-set for all player observations that are in-built data-sets. So that's why we don't have to include the batter Name for this line of code. But again, everything after that is the same. We're going to reset our index and rename the column Run value to be RV 17. So we go ahead and run that. And now we have the batter Id and players Run value in 2017. Now we can go ahead and merge together our Player_value16 and Player_value17 data on batter Id. Which is our unique variable that is going to link the two data-sets together. So we go ahead and run that. Now we have the better Id. And as I said before, now we have the batter Name because we included the batter Name in our Player_value16 data. Now it gets included in our new merged data called players, along with their RV16 and RV17 data for Run value. So if you just take a look at some of the observations, you can see that there's quite a bit more variation and Player Run value from 16-17, especially compared to the event Run values that we saw earlier. But this is probably as to be expected as there is lot more variation in individual player performance than we would expect there to be for event Run values from year to year. Let's take a look at this a bit more objectively now. We are going to now make a scatter plot. And we're going to do that using this sns.regplot function. On the x-axis we're going to have our RV16 data. On the y-axis our RV17 data. Color black, so that's going to tell Python what color to make the plot. Fit_reg equals true, fits the regression line through the plots so we'll be able to see the straight line that goes through the scatter plot. The scatter_kws equals S five, this S five-part represents the size of the data points in our plot. And again, the data that we're using is our players data. So let's take a look at what that scatter plot looks like. So we run that and we get the following plot here. So just looking at the plot, we can see there's definitely a positive relationship between RV16 and RV17. So where as the values for RV16 increase, the values for RV17 tend to increase as well. But we can see it's more of a moderate correlation. It's not a very strong correlation as we saw with event run values. So if these points were scattered very tightly along this line, that would indicate a very strong correlation. These are more widely scattered around the line. So this looks to be more of moderate correlation for a Player Run values for 16 -17. Let's go ahead and take a look at what the actual correlation coefficient is though. We had this np.corrcoef Function. For players RV16 and players RV17, you take a look at that correlation. And we get a value of about 0.45, which confirms what we observed with our scatterplot here. It looks to be a positive moderate correlation. It's a weaker correlation than what we saw with our event Run values. If you think about intuitively, that's probably to be expected. From year to year we expect players performance to change a decent amount. So as players players get older, their skills deteriorate, or as younger players get more experience in the league, they tend to improve. We also see injuries from time to time. So there's a lot more variation in individual player performance or event Run values. However, there is still looks to be some important information just using a player's previous season's performance to project his next year's performance. It looks like it could be a decent maybe baseline, if you wanted to try to do some projection on the players future performance. But one needs to remember that there's many other sources of variation that one has to take into account if you want to predict future performance. Let's go ahead now and take a look at team run values. We've looked at event run values, we've looked at player run values, let's take a look at how team run values change from year to year. To start out, we're going to create this variable team in our RE_17 dataset. The reason we're doing this is to note the batting team in our dataset. In our initial data, we have who the home team is, we have who the away team is, but we don't have a variable that denotes who the batting is at all times. That's what we're creating here with his team variable. We're using a where statement here, and we're going to say where RE_17 half, which is the half ending, equals top. Whenever it's the top half of the ending, we're going to set the value of team equal to away team, otherwise we're going to set the value equal to home team. If you're familiar with baseball, you know that when the away team is batting, it's the top half of the ending, and when the home team is batting, it's the bottom half of the ending. Let's go ahead and run that piece of code. If we scroll over to the right-hand side, we see now we have a variable team that denotes the batting team at all times. Let's go ahead and do the same thing for our RE_16 data. Again, where statement in where the half ending equals top, we're saying the value of team equal to away team otherwise we are saying this value equal to the home team. From there we get under an inside again, our new variable team that denotes the batting team. Now we can continue with the similar type of process as we've done in the previous couple videos. We're going to do for RE_17 and 16 data, we're going to create these DataFrames, RE team 17 and RE team 16. We're going to start out by doing a groupby, and then we're grouping by team, and then we add run value.sum. We're taking the sum of the run values for each team in both 2016 and 2017. Once again, we're going to reset our index and rename our columns from run value to be RV17 and RV16 respectively and then we're going to merge our 16 and 17 data on the variable team. Let's go ahead and run that piece of code. Now we have all of our teams and the aggregate run values for each team in 2016 and 2017. Once again, we see a decent amount of variation between the two seasons, a method team level. Let's continue on in the similar fashion as we did with our player analysis and look at a scatterplot for team run values from 2016 and 2017. Again, on the x-axis, we have RV16 and the y-axis RV17, color equals black. We are going to fit a regression line in the scatter plot. This scatter kws s 5 represents the size of the points, and then we're using our RE team 16, 17 data. Let's go ahead and run that. Now we can see the scatterplot, and once again, we see a positive relationship between RV16 and RV17. As run values for 2016 increase, the run values for 2017 tend to increase as well. But once again, we see a moderate correlation at best as all the points are pretty widely scattered around our regression line here. Let's take a look at what the actual correlation is, the correlation coefficient between 16 and 17 using our core collect function. We'll go ahead and run that, and we get a value of 0.35. We see a weaker correlation from year to year, 14 run values than we saw for our player run values, which intuitively makes sense if you think about it. Because all the sources of variation we talked about with player run values from year to year are relevant for team run values. But in addition, there's other factors such as roster turnover. New players sign with different teams, players retire, new players come up from the minor leagues, and the rosters are typically a bit different from year to year. That's another layer of variation that can explain some of these changes from RV16 to RV17. But once again, there is positive correlation and there does seem to be some useful information if one wish to project teams next season's performance with the teams previous season's performance. However, one needs to keep in mind that there are several other factors that we need to take into account when projecting performance. We've taken a look at event run values from year to year, we've taken a look at player run values and team run values from year to year. The next phase of this is we're going to take a look at salaries and bring that into analysis and see how it relates to all these things that we've talked about. We'll go ahead and get to that in our next video.