Let's take the example of an intervention aimed at people who are chronically homeless. This is a problem in any major city on the planet. In some cities in particular, there's very high rates of people who are chronically homeless. We also know that in a lot of places, people who are chronically homeless have much higher rates of being arrested and much higher rates of using the Emergency Department or Hospital or other kinds of healthcare services. An intervention was designed to assist people who are chronically homeless to give them housing, but also give them what's referred to as supportive housing. This is housing with additional social welfare services, often implemented through a case management approach. Again, our intervention here is supportive housing with case managers who are assisting the participants in the intervention with getting connected with social welfare services and healthcare services. In the pretest in our steady population, we had arrest rates of 65 percent in a 12-month period, and 75 percent had at least one Emergency Department visit or Hospitalization. The interventions implemented over a 12-month period and during that 12-month period, our data show that in the post-test period, the arrest rate of the people went down to 25 percent and the Emergency Department and Hospitalization usage rate went down to 40 percent. Now, what does that look like to you? Does that look like that was a pretty effective intervention? It's designed to assist people with housing, but not only with housing, to see if there's a decrease in these other city resources that criminal justice system resources, and then healthcare resources to see if the intervention reduces those as well. well, let's think about this design from an internal validity point of view. What are some rival hypotheses to the conclusions that it's the intervention that caused these drastic, significant drops in arrest rates and health care usage? Maturation. Would this have happened anyway? We looked at them for 12 months at high rates of arrest, high rates of health care usage. If we looked at them again a year later, would they look better just because they're changing the fact of being arrested or are having these health care shocks, as we might say, these underlying fluctuations and changes, they might have improved anyway without the intervention? We don't know. Was there a measurement change in the time of the pre-test to the post-test? Got to look into that. Regression to the mean. Tried to explain that before, but I think with this example you might see. These are people at the extremes of a distribution. These are people with higher arrest rates and people with a lot of emergency health care usage that lands them in the Emergency Department and the hospital. Even without the intervention, is it possible that these people might have had a better year, the second year, just because of regression to the mean? Actually, we know from data that that is the case. How do we control for these important threats to the internal validity of this research design? The simple pre-test, post-test design shows there could have been a pretty important and significant impact of this intervention, but we do have some threats that we need to worry about. How do we control for them? Well, usually, we do that with a control or a comparison group. We would have an experimental group who got the intervention, but we'd have a similar group that looks the same without exposure to the intervention, but now we have to worry about another threat to internal validity, and that is the threat of selection bias. Now, this is an important threat, but we only have to worry about it if we have a control group. Again, we have the experimental group and we have the control group where if we're comparing the two, we want to say, well, the control group is counterfactual and the experimental group, any differences we see between the two is the effect of the intervention. The rival hypothesis here is actually that maybe that control group isn't a good counterfactual. Maybe they are either observed or more concerning unobserved differences in the people, in the experimental versus the control group, and it's these differences that are causing the change over time, not the intervention. Just because we have a control group doesn't mean that we have good internal validity, we have to worry about, is that a good control group? Is there a selection bias, or is there some difference between those two groups that could account for an interpretation of intervention effects when there aren't any? We've really been focusing on the internal validity of a research design and all these different threats to the internal validity of a research design that have very technical terms. Internal validity, did the intervention actually make a difference in the setting under study, and are we measuring the effect of the intervention on an outcome without these threats to internal validity, which have all the technical names? Now quickly, as opposed to internal validity, what is external validity? We also want to think about that. We also want to think about, well, are the findings of our program or policy evaluation generalizable outside of the circumstances of the evaluation? Can we generalize to different jurisdictions or places? I think typically when people think about generalizability, that's what they're thinking about. well, if we did this study, let's say in Rome, this supportive housing intervention study with the chronically homeless and got results, are those generalizable or transferable to cities in Africa, to cities in Southeast Asia, to cities in the United States? Can we generalize from one jurisdiction and population to other places? Sometimes we can, sometimes we can't, so we want to think about that. But also, we want to think about with external validity, are there issues with generalizing outside of the setting of the research study? Was there something about the way we were conducting the research that affects the level of the intervention effect or is in some other way affecting what we're seeing? We'll talk more about that later. But again, for all your fun jargony terms, what I really want you to be thinking about is internal validity, the strength of a research design to establish causal relationships, and the threats to internal validity and their technical names.