In this third module we're going to cover a complete a creative modelling and scoring application from A to Z, so bear with me. The first step is to again, as usual, load the data, rename the columns correctly, compute date of purchase, year of purchase, the number of [INAUDIBLE]. January 1st 2016, which will later be used to compute recency. I'm going pretty quickly over all these things. And the first step is basically to extract all the predictors that we're going to use in our predictive model. Remember that as we said the predictors are variables computed a year ago, and these data will be used to predict what happened over the last 12 months. So, exactly as we've done in the previous tutorial we're going to compute everything we know about customers at the end of 2014. These are the predictors. And then we are going to look at what they did in 2015. These are the target variables. We are going to predict. The next step is to merge these two together. And again, exactly as in the previous tutorial for module number two, we're going to merge customers from 2014 with the revenue they generated in 2015, making sure that all the customers in the first data sets will remain in the data. We are going to call the data sets the in sample, meaning that we're going to run in sample predictions, and then later on we're going to run out of sample predictions on customers in 2015. Again, we merge everything, we transform the not applicable values into zero, and we have our revenue in 2015, which is how much money they've spent in 2015. And many of them spent zero. We're going to create a new variable. Whether they spend anything, we're going to call that variable active 2015. And, basically, we look at revenue in 2015. If it's above zero, it's a yes. If it's zero, it's a no. And we'll store that value as numeric. So instead of storing true faults, we're going to store zeros and ones. Pretty much pretty standard here. We execute everything, and the next step, just to look at what we've created. This is the data we're going to calibrate our creative models. We have recency, first purchase, frequency, average donation amount, maximum amount spent, which is a new variable we're going to use [INAUDIBLE] and then how much they've spent, and whether or not they've spent anything. So obviously, if there is a zero here, you have a zero there. Either is a positive value. You have a one over here. That's our calibration data. Now, we are going to calibrate the first model, which is the probability model. The likelihood that a customer we'd be active in 2015 or not. We are going to use the NNET library, which contains the function Multinom. Multinom stands for Multinomial Model. It's an extremely useful model to predict outcomes that can either be zero or one and nothing else, which is exactly what we'd like to use. So, we're not going to use traditional linear models as we could use later on. We use the binary model where the output can either be zero or one. And here is how is works, the output of the model which I call prob dot model is the output of the Multinom function with formula that states that active 2015 is a function of recency first purchase as frequency, average amount and maximum amount. Which is five of we have introduced just for fun. And the data is the data we created the calibration data called In sample. Next, these will fit, calibrate the entire model on the data set. And then we can going to extract the coefficients and the standard deviations of these coefficients, and output not only the coefficients, the standard deviations, but the ratio of those as well. Let me execute these first, so it converged, the model converged with, if you look at the sign of recency. The recency parameter, for instance, it's negative. Which makes perfect sense, right? The larger the recency, meaning the more days have lapsed between the last purchase and today, the less likely you're going to make any other purchase in the future. So, if your last purchase was three four ten years ago, it's extremely unlikely that you'll make any purchase very soon. Meaning that the sign of the primary is negative, the higher the recency, the lower the probability. However, if you look at frequency, that primary is positive. Meaning that the more purchases you've made in the past, the more likely you'll make additional purchases in the future, which makes perfect sense. Now, these two parameter values are the most interesting, simply because if you look at standard deviations and at the ratio between coefficients and standard deviation, which usually indicate to what extent each parameter value is significant, or not. If it's above two, or below minus two, usually it's a good sign, and as you can see here, recently is huge, minus 32. Way, way below minus two, so It's highly significant. Frequency is also the ratio of the frequency coefficient and it's standard deviation is extremely high as well, close to 15, but all the others are pretty close to zero or at least not as good and not as large as the other. So the impact of first purchase, average amount, and maximum amount on the predictions is actually pretty limited. So now we have created our model, the probability model. And we have stored everything we need in that variable to later on make additional predictions. What we'd like to do now is to predict if you're going to be active, how much are you going to spend with that specific retailer over the year 2015? The issue here, as we've discussed In the previous video. That model can only be calibrated on those customers who actually purchased something, and so we need to sub sample to only take those customers who were active in 2015, so we can calibrate an estimate how much they spend. And feet the model. So what we are going to do is take the sample, the viable 2015. Look which ones are equal to one and we'll store the index of those customers in a variable that we'll call Z for the time being being. Z would be a vector indicating which customers have been active in 2015, and only on those customers will we actually calibrate the second monetary model. So we're going to run that. If you look at the head of the data with only the index Z we've retained, as you can see, all these customers have active 2015 at one, which is exactly what we wanted. And all have spent something in 2015. And, finally, if you look at the value active 2015, everything is equal to one. We only have active customers in there. In terms of revenue, customers have spent anything between $5 and $4500 with that retailer of the year 2015. Now, what you are going to do is calibrate the monetary model. Meaning we begin to predict how much they spend in 2015 based on only two things here, the average amount they spend usually, and the maximum amount they spend. So we have two different predictors. Here we're not going to use the Multinom function, because the output is not something either zero or one, it can be anything. LM, which stands for Linear Model, will fit a linear model to match as closely as we can revenue 2015 based on the predictor's average amount and maximum amount. And the data here is not the entire sample, but only those customers who can be found in the index, Z here. So only those customers who actually spend something. Say if we run that model, it should be extremely quick, and show a summary of that model. As you can see, all the statistics here estimate parameter standard errors t values, everything is highly significant. Our square value is point 60, which is basically a signal of the fit of the model. But, we have a slight issue here. And the issue is that, let's plot on one hand how much has been spent by customers. And on the other, we take the amount model we've just created here through linear regression, and look at the fitted values, which are the values predicted by the model. If we plot that, Well, the chart will look terribly ugly. And the reason is that most customers have spent pretty small amounts. 50, 60, 70, 100, even $200, and a few outliers have spent huge amounts, up to three four thousands. And so basically the model is trying to fit a line through this cloud of points where actually no line is clearly a good fit. So what we're going to do, very much like what we did in the segmentation model, instead of creating a model with the amount here we're going to create an amount with log. So instead of predicting revenue 2015 based on average amount and maximum amount. We're going to predict the log of revenue 2015 based on the log on average I purchase amount and maximum purchase amount. Doing that. Actually, as you can see, the r square here has improved, meaning that we fit the data much better. And if you look at the plot, then it makes much more sense. So we've put more weight to the smaller values. Less weight to the very large values, and here you can see, you can imagine a pretty nice line going through that cloud of points predicting the revenue of 2015 best on the model we've just created. To summarize we have just calibrated two models the first one over here, to predict the likelihood that someone will be active. And the second one over here to predict how much they will spend if they will be active in 2015. Now the end game of this exercise is actually to apply the models to predict the future. So what we are going to do is to look at today's behavior, today's data, and extract exactly the same information about today's customers as we used in terms of predictors about a year ago. So we're going to extract recency, first purchase, average amount, maximum amount, and everything else for the 2015 customers about whom of course we have no idea who will be active next year, and how much they'll spend in 2016. But we can try to predict that, with our model, so once you create that and that what is usually called the out of sample data set. You have all your customers at the end of 2015, 18,417 people. And you are going to predict their probabilities to be active in 2015, based on an object which we've created that's our probability model. The new data is customers 2015, and the type of predictions we are going to make are the actual probabilities that's the primary of the model. The probabilities that someone will be active or not. So we add we create a new value the prob predicted, the probability predicted, to the data set we have and that column will actually contain the predictions from the probability model. We're going to create another column called revenue predicted, where the predictions we come from the amount model. In here we're going to apply that to the entire data set. However, remember that the amount model is actually predicting the log of the revenue. So you should like to the actual revenue, you need to take the exponential of the predictions, since the log is the inverse of the exponential, and vice versa. So, you predict the log of the amount using the amount model, and then you exponentiate that to get the actual revenue predicted, and the score the actual score of your customers is the conjunction of probability predicted multiplied by amount. So if you have a 10% chance of buying for $100, your score will be 10% of 100, which is 10. So we run these three lines And then we summarized the results. If we look at predicted probabilities on average, people, customers in the database have a 22.5% chance of being active. Some customers are predicted to be be absolutely certain to be active close to one. Other customers are predicted to be absolutely certain to be non active close to zero. And many are in between. If you look at predicted revenue, so if they are active how much they are going to spend next year the average is $65. It goes between six and 38,000 here. And of course it's six and not zero because it assumes that you will spend something. So the overall score Is actually a function of both and probability and revenue together. And the score has a mean of 18,8. What does it mean? From a managerial point of view, that value is extremely important. It means that, on the average, every customer in this database, will spend 18 point eight dollar next year. Some will spend zero, a lot of them will spend zero. Some will spend maybe $333. Some will spend 50, and so and so forth, but on average it will be 18.8. Some have a score very near zero. We don't expect anything coming from them. Some have a score extremely high, meaning they're potentially extremely profitable for the firm. And if you look at the histogram, of course, most people are around here you could actually create a histogram of the log or look into more details. But what we do is a slightly different exercise. We'll take the customers here. Look at their predict score and only retain those with the score. Above $50, and if you look at that, it will create a vector of people with a score above $50, which contains a total of 1323 customers. So in the list of 18,000 customers here about 1,300 have a predicted score of $50 or more and you can see which ones they are. Here you have the index of all the customers that have a predicted score above 50. And if you like to apply targeting applications, if you'd like to identify the customers with which you should spend the most marketing dollar. Those customers are the ones with the high score, obviously.