Let's take a look at our first experiment where we can measure a result with an analysis of variance, and we'll start with a common experiment that you may have even done yourself. A website ab test. An ab test Has visitors who come to a website and some are exposed to one version of the site and others are exposed to another versions hence the A and B term. We're going to analyze this as an experiment although obviously such an experiment doesn't take place in a lab as we might think of most experiments doing, but out there in the wild on the web. So here's the scenario we'll work with. First we'll talk about the design considerations of this experiment. Talk about some of the considerations when we're running the experiment, and then we'll move as we've done before to the arcode and show how we would analyze this experiment statistically, and report the result. So here's a scenario. Let's say on a given day. 500 visitors to a website are treated as part of the experiment. Perhaps the first 500 he visit the website on that designated day, and let's say half of them are exposed. To a website A and half of them are exposed to a variation of it website B. Now that may not be the optimal way to run an AB test perhaps it shouldn't just be on one day for example, and perhaps it should be more than 500 people, or perhaps it should be a certain number of people on a given day. All of those are good variations to consider, but for now we're going to keep it simple and just keep it to the scenario I described. We're interested in which website version causes people to view the most number of distinct pages. So, maybe we think that a redesign of a web site, say version B of this site, will have people stay on the site longer and view more pages. So distinct pages viewed will be our measure, and you could imagine in a real world AB test, we might also count time on site and perhaps page loads or page views total and other types of factors like that. Maybe even clicks and things. So we're interested in the number of distinct pages that they view. So let's talk through some of the design considerations. In this experiment. First of all, let's think in terms of our variables. Just want to introduce the notion of independent variables. And dependent variables. You maybe have heard these terms before. What are these? Let's make sure we're clear. Independent variables are the things we are manipulating. That's why they're independent. We're controlling them. So what's our independent variable in this simple website AB test? It would be Which version of the site they encounter, A or B. Dependant variables are the things that result from our manipulation, or sometimes called our treatment, which would be the site they're exposed to. The dependant variable is really the measure, and as I said before, we're interested in the number of distinct pages that are viewed so we can call that pages. Now, let's talk in general terms for a moment. The idea behind an experiment is that some measure, Y Is going to change and be a result of and this is using the tilde like kind of our notation does as we'll see more of. Some independent variables let's say x we just have one here so we'll call it x, but if we had more than one which we will see later in the course, we may have x 1 and x 2 and x 3 and so on, bY is related to X and then we have to add plus. Which is traditionally measurement error. The idea here, in our case would be the number of pages viewed we think Might depend on the value that x takes. Is x website a or b, plus measurement error. What's measurement error? Well this is actually a very deep issue, but you can think of it as the random, or error, or noise, that's in the measurement's that were taking over people, over subjects for this experiment. You might say, why is there any measurement error? We know how many distinct pages they visit on the website. That's true. In that case, we know the measurement of the page count Presumably without error, although there could be perhaps some error in our code that's logging that, or maybe some edge case that's not handled or something, but that's not just what measurement error is. Measurement error in this term is also considering the variation that naturally takes place when we measure things. So it doesn't have to be that we're logging it wrong. It could be that if I measured the same person on Tuesday, and then measured them again on Wednesday, they may in fact have a different result. If I measured two different people, they may have a different result. Due purely to the fact that they're different people, not because the website really is causing that. These errors are taken to be kind of random, and usually normally distributed, and they are part of any experiment, any measurement. In fact, we don't know how much air may be in a measurement. How much variation maybe, natural variation, and that's why we need to have an statistical power to draw the inferences over the population that we're after. Meaning, we want to know, is there a true difference between website A and B, in this case, in spite of the fact that we have some error in every single measurement, because of the so called natural variation. Of any human behavior that we might be measuring, so that's what that term is and it's inescapable and it's exact value, of course, is unknowable. So, in our particular experimental case, we're looking at, as I said, the number of distinct pages being in some relation to The site value of the site plus this error. Now there's something else to be said about the design of this experiment as well, and that is that these variables each have types and it's important to be aware of variable types. We saw in the previous section that we were recoding the subject variable as a factor which is R's term for a categorical or nominal variable type. We also know that there are numeric variable types. Also sometimes called continuous or scalar And there's even a third type called ordinal, or ordered, which are variables that are in a sequence that has an order like a liquard scale, like a one to seven scale or a one to five scale or short, medium, tall, taller, tallest. Things like that that have an order to them are called ordinal. So there's these different variables types and they affect the kind of analysis that we can do and the results that come. So let's take a look at variable types here. What's the variable type for this pages? It is numeric or numerical or scalar or continuous, all synonyms. I'll grab this color here and I'll make a note of that. In our customer analysis of variance situation we'll see some analysis where this is not the case, but most we'll see that our Y value will be numeric. It's a numeric outcome based on certain inputs, but what are those input types? What is the type of X here? It's the site that can take on two values, A or B. That is called a categorical variable type or nominal type. So that would be in our equation here. Categorical. So we have a function that we're looking at here which is number of pages, a numeric outcome could be the result of differences in a categorical input, or independent variable. Okay, so those are variable types, and we'll see that through out some of our analysis. Now, the other terms that are relevant here, that we'll use more commonly. We wont say independent variables, probably much beyond this moment. We'll say factors, because certain experiments we look at in the future will have multiple factors, and they'll be factorial designs. That'll be later in the class. So independent variables can also be called, let me use our other color here, can also be called factors, and factors can take on values. Just like site has in this case two values, those values are called levels of the factor. So we have levels a and b, for the site factor. Now, there's one last consideration to take into account, and that is that these factors can also be between subjects or within subjects. Well, what does that mean? Let me write those down. Between subjects, I'll abbreviate here. Between subjects, or within subjects. A between subjects factor Is one for which only each subject experiences only one value or level of that factor. So in our case each subject would experience either website a ,or website b, but not both, and within subject's factor Is one for which a participant experiences more than one level of the factor. In this case it would both website A and B. In a website A B test, when a visitor comes to a site, they're usually issued into one or the other variations of the website and not both. I mean, piece of local storage or a cookie or something similars put on the machine to kind of remember which site they were exposed to. So each time they go to the site, they get the same one. So, that's what a between subjects and within subjects factor is and then when we have multiple factors, we can have Ssome of them be between subjects and some of them be within. To be a within subject factor you only need to be exposed to more than one level of the factor. So if we had a, b, c and d say versions of the site, if a participant was exposed to a and b, but maybe not c and d, it would still be within a subjects factor. It would be a partial within subjects factor at that point. So these are some of the design considerations for this website AB test. What are some things to keep in mind when we run such a test? This is by no means comprehensive list of considerations, but it is a few things we'd want to think about. One question is do we measure each visitor only once? Remember we're measuring how many distinct pages they view. What if they come back in the same day, or what if they come back in a time when they're still within that group of 500 that we said we wanted? For that matter, how many visitors do we want, why 500? Should we want more, fewer? That kind of depends again on how big is the difference in pages visited between these website A, and B versions. The differences are great, we dont need, so many subjects if the differences are smaller we may need more to tell the difference. Is the split 50/50? Do half the subjects get A and half get B? You can run website A B tests of course with any arbitrary split say 90/10 or 80/20 In our case, for this data, we'll more or less do 50/50, but it may depend on an algorithm that assigns people the conditions in a way that could get slightly unbalanced, and so that's a consideration as well. Is the design a balanced design or an unbalanced design. Balanced designs have the same number of data points in every condition. Unbalanced designs do not. So those are some of the things to think about. For our purposes in this particular study, we will have near a 50/50 split, but it comes out, as we'll see, not quite exactly 50/50, and that's okay, and we have a total of 500 visitors. And we do measure each visitor only once. So, we have one measure per visitor, the number of distinct pages they viewed, either in website A or website B. Let's go now to look at the R code, and see how we would the analysis for this kind of experiment