Now that we have a new tool in our toolbox, let's join some datasets. We have df_fills and df_max already. But we want to bring in the condition argument, which is whether the garbage can has a sign on it or not. Let's create a new DataFrame based on the daily_max, which has this information. I'm going to merge the daily_max with the can info. I'll just pipe the daily_max dataframe in as a left join with df_cans, which is going to exclude cans which don't have an average daily_max value. I also want to get rid of cans that don't have a treatment assignment. These are in our dataset, but they weren't chosen as part of this experiment. Now, remember that when you do a groupby, it actually changes the DataFrame by indicating what the grouping should be. But we don't want this after that operation, so we have to call ungroup to get rid of that grouping. Then to create a little bit more clear of an indication that something was in the treatment or the control group, I'm going to change the zeros and ones to Treatment and Control. Nice, a number of our skills are here at play. We can see that we've got by-date and by-can dataframe with this new max_fill holding our outcome variable, and Z holding our conditional assignment. By the way, in addition to teaching me at Michigan that they call the color yellow maize, it also seems that they pronounce the letter Z as Zee. My apologies if I'm not so clear here. You'll also notice actually in ggplot that you can use two different spellings for many of the arguments, such as the correct spelling of color, which has a u in it and the American spelling of color, which doesn't. It's an interesting design decision that I haven't actually seen in many other libraries, and Hadley Wickham's kiwi, New Zealand or roots are showing nicely here. Regardless of pronunciation though, we should try and visualize this data before we do an inferential test of it. My interest here is really to do a sanity check and get an intuition for the data along our hypothesis question. Are the two groups actually different? Well, if I plot the average of the max_fill for the two Z conditions, I should get nice curves which I can visually compare. Since this is date data by condition or group based on values, so I'm going to group based on Z and our date reading. Then I'm going to calculate the average for the two groups. Then, of course, we've got to ungroup that data too. Then I'm going to send it to ggplot, where I'll plot both the daily average points and the lines. In this case, I'm going to use the linetype aesthetic to tell ggplot that I want it to change the line type for treatment and condition so that they are different. This is just like color, but it's a different approach. What do you think? Are these the same? Are they different? Take a moment to go back to the data table above. How else might we visualize this data to get some insight into what it actually means? As a second example, I thought it would be interesting to take a look at the data by the area of the city. Since we only have seven areas, this is pretty reasonable thing to put into one image. My question to you is, should we use a cowplot or facets for this task? Since we're looking to have all of the same plots, line and points, and we have a variable which indicates our different groups of interests, the area, a facet approach is what we actually want. I'm going to group here now by area Z and date_reading. The rest is the dplyr work is the same, and then the only thing that I'm really changing with the ggplot is I'm adding in a facet_wrap. Remember, we got to wrap this vars area, so we don't just say facet equals area or tossing area here. We actually have to wrap it in the vars and then you can set the number of columns and so forth. These all look pretty similar to me by eye. I think now we should do the t-test based on the daily max fill across all of the cans. A t-test in R is pretty straightforward, you call t.test and you pass in your two samples, in this case the Control and the Treatment garbage can max_fill values. Should it be a paired t-test or an unpaired t-test? Well, you only do a paired t-test if your sample subjects, in this case, individual garbage cans, are the same in the two conditions. Now, that's not true here, so we're going to do an unpaired t-test, which is the default. Now, keep in mind, we're not in the tidyverse anymore, this is straight base R for the first week of this specialization. Be ready for the different syntax and go back there. I'll just walk through this. I want to take df_can and I want to pull some data out of df_can, and I only want it where this Boolean mask, df_can dollar sign Z is equal to Control? This comparison is going to broadcast all the way through and return a set of trues and falses, whether it's actually equal to control or not, and that's going to mask the can data and thus reduce it. Then I'm just want to pull out the variable from that of max_fill. I'm going to pass the second parameter into t-test, exactly the same except I'm looking for the Treatment condition. If this looks a little unfamiliar to you, just walk through it, I'm pretty sure it will come to you. What does this tell us? Well, we can see that the mean, the average of the control group was 61.14 percent, while the mean of the treatment group was actually higher at 61.43 percent full. However, the p-value of 0.62 is quite high. Remember, the p-value ranges from 0-1 and gives us a sense of confidence as to whether the two groups are actually different from one another or whether it's just a chance occurrence. A number closer to one indicates this is likely to be a chance occurrence, and a number closer to 0 indicates it's likely to be a systematic occurrence. Generally, a parameter called Alpha is set ahead of time to determine the level of confidence you are willing to accept. Now I have to confess, this is often poorly done and poorly motivated. One of the discussions of this by actually our friend, John Tukey, set an example threshold for Alpha at 0.05, which means accepting a result which only has a five percent chance of being a random agreement. However, this is stuck as a hard-and-fast rule in the social sciences in particular and it can means that some promising findings are actually rejected outright instead of being discussed and followed up on. As your observations grow in number, your p-value will shrink. An experiment that has a dozen garbage cans and a p-value of 0.05 actually has a much higher signal than an experiment which has 1,000 garbage cans and a p-value of 0.05. The meaning of the p-value hasn't changed, but how we interpret it and what we do with it might. You also have to think about effect size, how big a difference actually was if it was observed. You must think about the cost of a given intervention when you're decision-maker. This intervention is cheap and could easily be rolled out all the garbage cans in DC. But our effect is actually in the wrong direction. Even if we were confident there was a difference, we wouldn't want to put these signs up. In much of my work, I consider a range of p-values and use them to inform my next action. For instance, if I have a relatively small study and I get a p-value of 0.15, I'll usually think about what noise might be in the study, the different factors that impact the outcome, and then plan a larger replication study. This is a critical part of my scientific inquiry, and I think things are slowly changing away from the strict Alpha values with the introduction of more modern methods. One that's rapidly being adopted in medicine and health in particular is the use of confidence intervals, which are actually reports here if you go back. Another which is being used more heavily in computational social sciences are Bayesian inferential methods. Speaking Bayesian methods, the lab at DC didn't actually do a t-test like we did, although they came up with a similar conclusion that there's likely no big difference between these groups. They used a Bayesian predictive model to compare the two groups and it's bit out of scope for this class, although a reasonable approach for them to be using. Their code and data is all freely available on GitHub, a centralized place that many people use to share data, and since it's all written in R, you can dig in and see what they've done, if you want to.