In this video, we will introduce instrumental variables. We will begin by discussing unmeasured confounding and why that causes a problem in standard causal inference methods. Then we will introduced instrumental variables discuss our properties and begin to motivate why they might be useful in unmeasured confounding situations. So just as a review, this is a kind of classic confounding setting here where we have treatment A and outcome Y and there's some confound as X. So X here is affecting treatment and it's affecting the outcome directly. But as long as we observe X then we could use a causal inference method such as matching, or propensity score matching or inverse probability of treatment waiting. And we could still estimate the causal effect of A and Y, and even if there were some variables that only affected Y, such as what I'll call them V. So these are known as risk factors, variables that only affect the outcome. We would still be fine to just control for X. So even a DAG like this doesn't cause a problem. We could simply control for X, using one of these causal inference methods. And it would be fine. We also could control for V, as well, and that wouldn't be a problem. However, in many cases, we might be concerned that there's unmeasured confounding, so I'm depicting that in this DAG here with the variable U. So we'll think of X as something that we observe. So there's some set of variables that we observe, but there also might be some variables that we don't observe. I'll call that U. And I'm using these dashed arrows just to sort of extra emphasize that this is different than the Xs, in the sense that we don't observe them. So if you had unmeasured variables that affect both treatment assignment and the outcome, then, we would have a problem. The ignorability assumption would be violated. So, condition on one x, treatment would still couldn't be thought of as randomized. In that case, we would have biased estimates of causal effects. Because remember, when we're, for causal inference methods that control for confounding, what we really do is we as such we condition on the confounders and then average over them. So if you are matching, you are pairing up people based on x. And then we estimate treatment effects within those pairs of x and average over the distribution of x, or an inverse probability of treatment waiting, we create this pseudo-population, where these is balance on x in the treated and control subjects. But we couldn't create that kind of balance on the unmeasured confounders just by conditioning on X. Same thing if with matching, we can't match on U directly since we don't observe it, so we couldn't actually get an average causal effect. So that's motivating the need for different statistical methods, so imagine that we're pretty sure in our study that there's unmeasured confounding. So we know that these sort of standard methods that control for confounding are not going to be good enough. Well, it's not completely hopeless as long as you can identify an instrumental variable. So, an instrumental variable is, I'm going to use Z as the notation. So, Z is our instrument, so that's the instrumental variable. And the main idea here is that the instrumental variable Z is going to affect treatment. So you could think of it as a affecting treatment assignment but it's not going to directly affect the outcome. So you'll notice that there's no direct arrow from Z to Y. You could think of Z as just being a sort of a randomizer, but it's randomizing to encouragement. So, for higher values of Z, for example, you might be more encouraged to get the treatment. Lower values of Z, less encouraged. We'll think of it as randomized. So imagine that we're randomly either more aggressively motivating people or less aggressively to receive some treatment. But this motivation will imagine does not directly affect the outcome. So this is what we mean by an instrumental variable. So there's some kind of, even though there is confounding and maybe even unmeasured confounding, there's this one part that's where there is randomization. So some part of treatment is being explained by something that's random. And so that gives us hope in being an estimate of causal effect even if there's unmeasured confounding. So as an example imagine that we're interested in whether smoking during pregnancy affects birth weight. Okay so our treatment, if you want to call it that, you could also think of it as an exposure, would be smoking during pregnancy, so just a yes or a no. And then our outcome is birth weight. And you can imagine there might be all kinds of confounders, so variables that might be related to the decision to smoke or not, and also related to birth weight. So it could be the mother's age, for example, whether she's given birth before, or her weight, and so on. But there also could be unmeasured confounders so it's possible that we haven't collected enough variables to really explain why the decision is made to smoke or not. And also to sort of capture variables that would affect birth weight. So we might be worried about unmeasured confounding. And we also have this problem of if we wanted to randomize trial, we wouldn't really be able to randomize pregnant women to smoke or not. So there would be obviously ethical problems there with sort of asking people to smoke or asking them not to smoke. So what we could do though is have what's known as an encouragement design. So we could actually carry out a randomized trial, and what could be done here is to actually enrol women who are smokers into the study. So imagine a population of people who smoke and then once they're pregnant, you could basically, either randomize them to one of two groups. And one is this usual care group where you just, they just get whatever they normally would. So, they get the same amount of prenatal care that they normally would. If their physician would normally discourage them from smoking, they would just whatever usually happens, so that's a Z=0 group. But they also could be randomized to the intervention arm which would be in arm whether to sort of extra encouragement to stop smoking. So there'll be some kind of active intervention aimed at preventing smoking. And so this actually was done in 1984 paper where they actually did carry out an encouragement trial like this. And so we're not actually directly randomizing people to smoke or not, we're just randomizing them, In terms of encouragement to stop smoking. So some people get a big dose of encouragement to stop smoking, and others get usual care. So in that previous example, we could think of Z as an instrument because the encouragement to stop smoking should not affect birth weight directly. So if I just encourage you to not smoke that should not affect the birth weight of your child, other than through the impact it might have had on you to stop smoking. So it should only affect the outcome through its effect on the exposure. So then now we'll start to think about what we could estimate from this kind of design. So one thing you could do, is what's known as an intention to treat kind of analysis, where we just compare the outcomes under the two intervention arms. So in other words we could focus on the casual effect of encouragement. So here I'm writing that as a contrast of potential outcomes, and you'll notice these potential outcomes are indexed by Z here. So for example, we have the expected value of Y with superscript Z = 1. So what this means is, it's the, you could think of this as the average birth weight, if we had given everybody in the whole population encouragement to quit smoking, okay? So imagine everybody had actually been in the encouragement arm. What would the average birth weight be? And we could then contrast that with the average birth weight if everyone had been given usual care. Okay so then that difference would be a causal effect. And it would be a casual effect of encouragement not a causal effect of smoking itself. So what impact did this encouragement have on birth weight? So that's known as an intention to treat effect. It's just focused on the sort of as randomized comparison. What is the thing we're actually randomizing? In this case it's encouragement. And so then we could estimate a causal effect of encouragement itself. So that's potentially of some interest, especially if you want to know what impact would this kind of active intervention have on outcomes. However, it's not directly getting at the causal effect of smoking itself, so we're not contrasting potential outcomes, where The index of these potential outcomes is treatment itself. It's smoking itself. But so instrumental variable methods are going to try and use the instrument, this randomization, this encouragement to get at the actual causal effect of treatment itself. So in this case, the actual causal effect of smoking. So in the example in this encouragement study, we actually have randomized encouragement. So this was an actual randomized trial where encouragement was randomized. So there's plenty of cases like that where our instrumental variable is actually something that we physically randomized, say with a random number generator. But other times, there might be instrumental variables that we believe exist but they just exist in nature, so more of a natural experiment. So they're not necessarily literally randomized by an investigator. So some examples that people have considered are Mendelian randomization and genetic studies. There's, a quarter of birth has been used so in what quarter of the calendar year where your born has been used as an instrument, which has been shown to affect how many years of education you get. But, quarter of birth has not directly randomized by investigators, but there is reason to think that it shouldn't be related to some other outcomes. Not directly. Another common example is geographic distance especially care providers, where the distance itself is thought of as sort of a natural randomizer. And in future videos, we'll discuss these examples and more in detail. But here I just want to introduce this idea that instruments we think of as encouragement to either take, to basically get the treatment. And in some cases, we're actually randomizing this kind of encouragement. And in other cases, we just believe that this kind of encouragement exists in the real world. So we'll look at both of those in future videos.