Hi, this video is on confounding. In this video, we're going to briefly discuss what confounding and how it relates to the ignorability assumption. We're also going to use that information to motivate why we will want to learn about causal graphs, which will be covered in future videos. As a brief review, we are interested in the relationship between means of different potential outcomes. So, potential outcomes are outcomes that you would observe if you set treatment to certain values, such as treated versus untreated. So for example, we might be interested in the mean difference in the outcome if everybody was treated versus if no one was treated. As we've seen previously, to be able to estimate this from observational data, we will need to make several assumptions, including ignorability. Ignorability refers to the treatment assignment being independent of potential outcomes conditional on some set of covariants X. As an example, suppose treatment assignment depends on the potential outcomes. So in this case, the ignorability assumption would be violated, so that sicker patients are more likely to be treated. It's also true that treated patients might be at higher risk of a bad outcome. We would therefore need to account for these pre-treatment differences in health. But now suppose X consisted of several measures of health, history of various diseases, age, weight, smoking, alcohol use, and so on. Imagine we've captured a large number of these measures of health, and we'll just call that collection of variables X. It then could be the case that within levels of X, in other words people who are the same age, have the same history of co-morbidities, are of the same weight, same smoking history, and so on. It might be the case that within those levels of X, the sicker patients are not actually more likely to get treatment. So that's what we would mean by ignorability, in that we cannot actually ignore treatment assignment in general. In this case sicker patients are more likely to get treated, and are more likely to have a bad outcome. But if we capture enough information, in this case about the health of the individuals, then as long as we condition on that, we are able to say that treatment assignment is random. So if we make the population of patients homogenous enough, then we could think of treatment assignment as random. In that case we would consider treatment assignment ignorable, meaning the ignorability assumption would hold. Now we should think about confounding. In the previous example, you could generally think of health as a confounder, because it was affecting the probability of treatment and the outcome itself. In general, you could think of confounders as variables that affect both treatment and the outcome. It's important to note that when we say that it affects the outcome, we mean independent of treatment. In other words, not through its impact on treatment. In this first example I'll give here, this is something that would actually not be a confounder. So imagine if treatment assignment was actually random, which you could think of as being based on a coin flip. In that case, treatment assignment, this mechanism, the coin flip, would affect treatment, but it should not affect the outcome. And there's no reason to think that a coin flip would affect your typical outcome. So in this case, the variable would not be a confounder, the coin flip itself, the assignment mechanism, would not be a confounder because it's only affecting treatment and not affecting the outcome. Another example of something that's not a confounder would be if people with a family history of cancer are more likely to develop cancer. In this case let's say that cancer is the outcome, or perhaps family history was not a factor in the treatment decision. In that case, the variable that we're thinking of here, family history of cancer, would affect just the outcome and not the treatment decision, so it's not a confounder. And, in fact, a variable that only affects the outcome is sometimes referred to as a risk factor. So this is not a variable that we would need to worry about when it come to the ignorability assumption. Finally, imagine that older people are at higher risk of cardiovascular disease, which here is the outcome, but are also more likely to receive statins, which here is the treatment. In that case, age would be a confounder. Age is affecting both the treatment decision, which here is whether or not to receive statins, and is also directly affecting the outcome, which is cardiovascular disease. So, when it comes to confounder control, we are interested in first identifying a set of variables X that will make the ignorability assumption hold. So remember the ignorability assumption is saying that treatment assignment is random given X, and X is a collection of variables. So the question is what collection of variables? So we would need to identify what that collection of variables is. If we're able to find a set of variables like this, then that's sufficient to control for confounding. We're also interested in statistical methods. Imagine that we have these variables X. So the ignorability assumption would hold, but then we have hope of estimating causal effects. But what are the actual statistical methods that we would use to do this? So that will be a focus later in the course. Finally, we come to causal graphs. Causal graphs, which we'll introduce in a future video, will be used to help us identify which variables we will need to control for. So this is actually not an easy question, so what is the collection of variables that would make the ignorability assumption hold? We will see that causal graphs will help us to answer that question, So again, the goal is to find a set of variables X that will achieve ignorability, which is equivalent to saying that we'll find a set of variables X that will be sufficient to control for confounding. Causal graphs are going to help us answer this question, and it will also help to formalize these key ideas.