So we just finished looking at an independent samples T-test in an AB testing situation with two website variations. We wanted to know was there a difference in the number of distinct page views people had on each website. Now things will get a little bit more interesting. Let's consider the scenario of task completion times, a commonly used measure in interaction design in HCI studies. Let's consider the case of authoring tools. Authoring tools might be sketching tools, other design tools, or programming tools. In our scenario here, let's consider programming tools and programming languages and the time it takes to write a series of, say, small programs by intro to programming students using different languages and environments. Let's be specific. Let's say we have 40 intro to programming students split into two classes of 20 each. And let's say that one class uses the C# programming language in Visual Studio. Let's say the other class uses the Eclipse environment and the Java programming language. And they write a series of programs, and we want to know how long does it take them to complete these programs. Then later, we can add a third group. We'll add a group of 20 students from a class using Python in the PyCharm environment. So what are some of the considerations for an experiment like this? Well, a big consideration that will come up in a lot of experiments, really all experiments, is that of experimental control. We want to be able to control the situation so that we know the things we manipulate are responsible for the changes in the measures that we make, and not due to some other factors. For example, in our scenario, we have an educational setting of students in classrooms. Things that may be outside our control are, for example, do the students have different teachers? Maybe some are more effective at teaching programming than others. That might be a factor changing their task performance time rather than just the programming environment and language that they use. Can the students get outside help? What if some students get more help than others? What if one of the teachers encourages the students to get help but another teacher doesn't? These and many other things could influence the results of the task completion time that we're interested in and become what are called confounds to the results that we observe. Confounds threaten our ability to draw inferences about the effects that we're manipulating, Because they introduce uncontrolled variation in the experiment. Other issues we'd have to consider, what about prior knowledge? What about students who might already know how to program versus those who don't? More control would come from having all students be on equal footing when they start. These might be inclusion or exclusion criteria like we talked about previously. How are we measuring results? One way to capture data in HCI and interaction design studies when we're using computational artifacts is to do automatic logging. Maybe we can write log files or timestamps that tell us something about how long the students are taking to write their programs. There's lots of excitement these days about in-the-wild studies, studies that are outside the lab, in the world, perhaps online, perhaps someone downloads an app from an app store. All of these are very exciting new directions in research studies, but realize that they do trade off against experimental control. Realism, or what some people call ecological validity, comes as a trade-off against control, which is, as I've said, the ability to link cause and effect in an experiment. Without control, it's fundamentally hard to know if the changes in the responses we see, which is what we measure, are really due to the changes in the factors that we manipulate. So what are your options for exerting control? How can you take steps to make sure that confounds don't threaten your study and you have some known control that you can use to your advantage? Well, there's kind of a series of things that you might try to have control. The first is, if something in a study threatens to change the results you see, manipulate it. Manipulating it means treating it as a factor, systematically changing it, making it a variable. So for example, in our scenario we might want to manipulate the experience of the teacher. Maybe we can recruit teachers of our own. Perhaps this is at some kind of school that we run or a camp that we run. And so we actually manipulate the backgrounds of the teachers so we have experienced teachers, inexperienced teachers, and maybe average experience teachers, and we can see how experience affects these outcomes. If we can't manipulate something, then the next step might be to control for it. If we can control for it, that means maybe we can't manipulate it, so we can't have low experience, medium experience, and high experience teachers. But maybe we can say well, let's make sure all the students get the same experience teacher, whether they're low, medium, or high. That's called controlling, in that case for teacher experience. Well, maybe we can't do that. Maybe we don't have available to us the ability to do that in this particular case. If we can't control for it, then let's at least record, Record or measure, Or measure it. So we can't manipulate it say, we can't control it, but we can then at least record what is the experience of our teachers in this study. And later we can do an analysis to see, given what we had in the way of their experience, did it seem to make a difference? If we can't even record or measure it, that becomes an interesting case, because we probably aren't even fully aware that it exists. This would be a completely hidden effect, hidden to us. And there's not much you can do in that case because, by definition, it's hidden. You're not sure it's really there or what effect it might have. The large bulk of science spends its time searching for hidden effects, things that cause other things in interesting ways that we're not always fully aware of. That's a lot of what we do in science in the first place, is searching for hidden effects. So these are options for exerting experimental control and handling things that could be confounds and threaten an experiment. What are some other considerations for actually running an experiment like this? Well, authoring tasks and authoring tools, like writing programs, they can take lengthy amounts of time. You could be measuring things in many, many minutes or hours or even possibly days. We'd have to give breaks to people, we'd have to control the lengths of those breaks and how they're spent by the subjects. Again, trying to prevent confounds from entering the experiment. We'd probably have to keep people on task in a lab setting for something like this, as opposed to having them out in the wild where all those other influences could come into play. When we're running the experiment, do we set a time limit? What if some students aren't able to accomplish the program in a reasonable amount of time? If that happens, what do we do in the data table to indicate they didn't finish? If we give them just a very high number, wouldn't that seem like possibly just a long number of minutes that it took them to complete? How do we show they actually didn't finish? Maybe we have to drop them from our data completely. All of these are context-specific issues that you'd want to take into account when running a study like this. And of course if we have too many dropouts, that can be an issue, because we lose data or we have incomplete data. A good thing to remember is experiments aren't just about recording numbers. Observe things qualitatively, take notes, and make your own conclusions about the kinds of behavior that you see. Qualitative observations are very powerful results from experiments as well as quantitative data. So what's the formal design and analysis for this study? Well, as I said, we have 40 students. We have 20 each in two classes, and we'll add a third class of 20 others later. What's our response? The response is the term for what we're measuring. The response, as we've said, is time, let's say in minutes. And this would be a numeric value, a numeric variable sometimes called a continuous variable or a scalar variable. What are our factors? We talked about factors last time. Well, we just have one. We have what we might call the IDE, which is the integrated development environment, also the programming language that goes with it. How many levels do we have of this factor? Well, so far we have two. We have Visual Studio and C#, and we have Eclipse and Java. And lastly we can ask, is this factor a between or within subjects factor? And we know that it's between, which means each student only experiences one level of the factor. Each student only uses either Visual Studio and C# or Eclipse and Java. So let's now go to our R code and analyze this data. See if it makes a difference in their task completion time. Okay, so here we are in R Studio with our R code, continuing our analyses in the Coursera.r file that I've authored. And we're looking at the study of programmers using two different IDEs and programming languages, comparing Visual Studio to Eclipse. Of course, this data is completely fictitious and shouldn't be taken as any kind of endorsement of one programming tool or another. So let's first start by reading in the IDE2 data file. And you'll see me again highlight the lines that we're executing, and then you'll see them execute in the console window down here below. I'll assume you'll have this code, and you can follow along and even do these analyses with me as we go. And as before, I won't explain every piece of R syntax or function, but I'll try to highlight the major features of important things, and assume that you'll look more and more into how R works and some of the syntactical conventions on your own. Having read that file, then let's view it quickly. I know that's quite small in the video, but what we're looking at are three columns. Subject, IDE, and time, time as in minutes to complete these programs. We're going to change subject, as is good practice, to a nominal or categorical factor. And we can do a little summary of these columns here which we can see below. The mean time, for example, is about 385 minutes to complete this series of programs. Okay, so now we can actually view some other statistics by IDE. So we can see the mean for Eclipse there and the mean for Visual Studio down below. And if we want their standard deviation, we can also do that here. You'll see this pattern of analyzing data replicated for most of the datasets that we look at. Kind of getting a feel for the data before we dive just into an analysis, and that's an important point. It's always a good idea to take a look at data and get a sense of it to make sure things look about right and there aren't drastic outliers and things before you just dive right into a statistical analysis. As part of that, we can look at some histograms. So here's a histogram of Visual Studio's time, distributed as such, and we can also see one of the time in Eclipse. You can tell that unlike Visual Studio, you can use the little back button here, which is perhaps closer to a bell curve. We can see the time in Eclipse looks quite different from that. We can also do a box plot. So we can see here that the mean for Eclipse as we saw in the summaries was a little higher. But also the spread around that is quite drastically higher, we can see there. So as we did with the website AB test, we can also then run a T-test on time as the measure as it relates to the changes in IDE. And we'll execute that here. This is called the independent samples T-test or two-sample T-test. We can see the T statistic here, we showed how to report that before. The degrees of freedom here are 38. And the p value is less than 0.05. So we would seem to have a significant difference as the box plot here would sort of suggest that we would. Is this a suitable analysis? Well, maybe. But maybe not. We're going to go now back to the glass board and talk about an important set of assumptions that underline analyses of variance, like the T-test, and these are called the ANOVA assumptions.