So, in this section, we'll consider several prominent study design types in the world of public health research. So, at the end of this lecture section, you will be able to describe the similarities and differences between the randomized cohort, observational cohort, and case-control study designs. Explain the major analytical challenge and challenges that come from comparing outcomes across groups where the group membership has not been randomized. Start to become aware of some of the major issues to consider when making conclusions based on study results. In other words, when mapping from the statistics, we get to scientific clinical substantive conclusions. So, the common study designs we'll deal with in this course, we're going to focus around something known as prospective cohort studies but we'll give a little shout out to case-control studies along the way as well. So, the two types of studies we'll focus primarily on are prospective cohort studies, and within this, two different types: one which is called a randomized controlled study design and the other is called an observational cohort study. In both cases, the subjects who are under study are being classified as to their group status, usually defined by some sort of exposure at the start of the study and then followed over time to see who develops the outcome or outcomes of interests. So, in these, the exposure precedes the outcome in time. In case-control studies, subjects are chosen based on their outcome status and then the exposures that occurr prior to the outcome are assessed. So, again theoretically, the exposure precede the outcome in time but people are chosen based on whether they've had the outcome or not and then their exposure prior to that outcome is assessed via medical records or recall from the participants themselves. So, let's first talk about doing group comparisons of outcomes across different groups via prospective cohort studies. So, if we were doing a randomized study, what we would do is get a representative sample from the general population under study and then the researchers, with all the persons taken for the sample, randomly assign the members to different exposure groups or randomly assign them to get a treatment or to be in the control group or randomly assign them to intervention one, intervention two, and a control group, for example. Then we can compare differences in results between the two interventions and how they compare to control. For an observational study, we can either do the same thing as to start where we get a representative sample from the general population under study and then ascertain group membership. So, if we wanted to do a study to compare outcomes between smokers and non-smokers, we could take a random sample from the general population under study, population A, and then ascertain whether persons in the sample are smokers or non-smokers at the time of the study, for example. Or if we wanted to ensure an equal number of smokers and non-smokers, for example, in our study, we could instead sample randomly from a population of smokers and sample randomly from a population of non-smokers to get the desired number in each of the groups. But in either case here, participants are still assigned grouping factor based on something they've done; some behavior or characteristic of them that is not assigned by the researcher as it would be in a randomized study. So, randomized trials or experiments are important. They're important in the study of medicine and public health, they're important for accounting for many kind of biases. Randomization done correctly with a large number of subjects nearly ensures that the only systematic difference in the groups being compared is the exposure of interests. So that the differences we see if we see any between the exposed and unexposed groups are not fueled by other systematic differences between the exposed and unexposed groups that may be related to the outcome as well. So, let's look at a very famous randomized trial. This is actually epic for many reasons notwithstanding that this was done in the pre-computer age. This would be hard to do in the computer age let alone and it was done without the aid of computers. So, this was done in the US in 1954, the 1954 trial, the Salk Polio vaccine. Over 400,000 school children were randomized, 200,745 were randomized and vaccinated for polio, and the other 201,000 plus were given a placebo. So, this was a very large undertaking. It's rare to find trials conducted on this scale in one country like the US these days. So, at the end of the study follow-up period, there were 82 cases in the vaccine group and 162 in the placebo, half as many cases in the vaccine group which was of similar but not exactly the same but of similar size to the placebo group. We'll go through the statistics that we'll require to determine whether we could see such a difference by chance if there were no difference in the efficacy of a vaccine versus no treatment, and talk about what the implications of that are for similar types of studies throughout this course. But as you know, the researchers ultimately concluded that this vaccine is very effective and it's become part of common vaccine practice in children's vaccine schedules. Subsequent analyses report slightly different numbers than the ones I'm showing you here because some false positives were ultimately discovered in each of the two groups. So, what are the benefits of randomization? So, randomization helps protect against self-selection biases. I'm going to give you some examples of such biases. Maybe if we allow children to choose or their parents to choose which of the groups they got put in, maybe parents of male children will be more likely to volunteer their children for placebo than the parents of female children, and we have a biological sex imbalance in the vaccine and placebo group. Or maybe the smokers were less likely in another clinical trial, not a child and children, but less likely to be in the exposed or treatment group because they self-selected to be in the placebo group. Or maybe healthier persons sign-up for an intervention versus a placebo. Well, none of this will happen if the choice of who gets the intervention or treatment and who gets the control is determined by the researchers and not the participant. So, how might you randomize persons to treatment and control group, two groups? Well, an old fashioned way to do this might be to just pull out a coin and flip it. If it comes up heads, you randomize the person to treatment. If it comes up tails, you randomize them to control. What if you had more than two groups? What if you had three groups? Well, you might pull out a classic six-sided dice and roll the dice and if it comes up a one or a two, you would assign them to group one. If it comes up a three or a four, you would assign them to group two. If it comes up a five or a six, you would assign them to group three. Of course, all of these and more complex randomizing schema can be set up on a computer which is usually the method researchers use nowadays for randomizing. So, the goal of randomization is to eliminate any systematic differences in characteristics of the subject in each of the exposure groups under study, save for the exposure itself. So, we want the only systematic difference between children who got the polio vaccine and the placebo to be getting the vaccine or the placebo. But of course, randomization is not always possible and unfortunately, for scientific purposes, but it's certainly important for ethical purposes and in general health, this is a good thing. But one cannot always perform randomized trials so we certainly could not ethically or operationally do a study on risk factors for a certain disease including smoking. If we wanted to isolate smoking and see what its impact was, we could not randomize persons to be smokers or non-smokers. But we still want to study this important exposure, certainly other exposures of interest. Things like ethnicity, environmental exposures, et cetera, can not be randomly assigned but they're important to study their association with public health-related outcomes. So, what can we do? Well, there's another type of prospective cohort study called an observational cohort study and, again here, the exposure will proceed the outcome in time but the exposure here is not determined by the researcher. The observational studies are studies in which subjects "self-select" to be in the exposure groups, they are not randomized. Sometimes, this is the only type of study that can be done. So, here's a situation again where outcome and exposure relationships of interest but sometimes, they're difficult to directly assess because of the selection bias issues which may lead to systematic differences between the exposure groups other than the exposure of interest. So, some examples may be maybe smokers are more likely to drink alcohol which may be related to the outcome of interest as well, like coronary heart disease or something. Or vegetarians, those with a plant-based diet, are more likely to exercise so they're more likely to adopt other healthful behaviors, so isolating the outcome diet type will require taking into account those other differences. So, let's look at an example of an observational cohort study. This has been replicated several times, many times actually in different cities in situations, but it's a difficult thing to study because of the inability to randomize. One of the seminal studies on the potential impact of a needle exchange program for intravenous drug users on reducing the risk of HIV infection. So, this was one of the seminal articles based on this needle exchange program in New York City, this has been studied over and over again in multiple cities across the world. What the authors found is ultimately, we observed an individual-level protective effect against HIV infection associated with participation in a syringe-exchange program. So, at the start of the study, they were looking at persons who "self-selected" to take part in the needle exchange or not and it's not ethical to randomize persons to this because denying somebody the opportunity to engage in needle exchange would ultimately be effective if someone adhered to it 100 percent in reducing the transmission of infectious diseases. It's not ethical to force them to not participate by assigning them to a control group. So, people self-select to be in these two and what they did is they started in a certain time and point of persons who were HIV negative, they classified them as to whether they were participating in the needle exchange program or the control program and followed them up for an up to a fixed amount of time to see who developed HIV in between the start of the study and by the end of the study. Again, they observed an individual-level protective effect against HIV infection for those who participated in the needle or syringe exchange program. But ultimately, they made this conclusion not by just directly comparing head-on the HIV infection percentages or rates between these two groups. They had to also account for a lot of other differences or potential differences between these two groups that may be related to HIV as well including their age, their sex, their race, their frequency of ejection, et cetera. Here's another such study, an observational cohort study on HPV vaccination sexual activities in teens and they focused on female teens for this study. The cohort included almost 1,400 girls, 493 who were vaccinated and 905 who were not, and the choice to be vaccinated was made by the teen and the parent, not by the researcher. So they started with this and then they followed them over time, and this was used in insurance records actually, from the time of whether they ascertained their vaccine status, HPV or no HPV vaccine, and they followed them over time and they looked at markers of sexual activity in their medical records and what they concluded was the HPV vaccine in the recommended ages was not associated with increased sexual activity related outcomes. But of course, there could be a lot of differences between those teens whose parents allow them to get the vaccine and those who didn't or didn't authorize it. So, they had to level the playing field so to speak. So they came to this conclusion, after the association was adjusted for other characteristics of the teens including health care seeking behavior and demographic characteristics. So here, with this idea of adjustment, I'm getting at the idea one of the biggest challenges with regards to analyzing observational studies is something called potential confounders. We'll define confounders formally in the second half of the course and talk about dealing with them. But, confounders are factors that are related to both the outcome and exposure of interest and ignoring these can distort or negate the association of interests. So for example, if we were studying the association of the common cold and alcohol consumption in the winter months in a cohort of persons, well those who drink more alcohol may be more likely to smoke cigarettes and smoking is associated with increased cold risk. So, if we find a positive association between the common cold and drinking more alcohol, it may be at least in part fueled by the fact this behind the scenes relationship between smoking and drinking alcohol and smoking and the risk of a cold. We can adjust associations of interest for potential confounders and we'll delve into a whole set of methods in the second part of the course, term two, to do this. But the nagging question is even after adjustment, it's what confounders did you not measure or address? Whereas with randomization, the ideas that we can eliminate systematic differences on things we could conceptualize as being compared potential confounders and also on things we'd never think of. But in observational studies, we can only adjust for confounders or potential confounders when we've measured that information and we've conceptualized it. Sometimes observational studies generate ideas that can be tested by a randomized trial. So, there are some situations where observational studies are done even when the exposure of interest can be randomized but they are done first to generate evidence to consider whether to proceed with a randomized trial or not. So, this was example with beta carotene and health. So, the background on this is the observational studies suggest that people who consume more fruits and vegetables containing beta-carotene have somewhat lower risks of cancer and cardiovascular disease and earlier basic research suggest plausible mechanisms. Because large randomized trials of long duration were necessary to test this hypothesis directly without the threat of confounders, the researchers conducted a trial of beta carotene supplementation. What they found in this randomized study, among healthy men at least 12 years of supplementation with beta carotene, produced neither benefit nor harm in terms of the incidence of malignant neoplasms, cardiovascular disease or death for all causes. So, even though there was some evidence in the observational studies, this randomized trial did not find anything. In the ladder or hierarchy of evidence though, the randomized trial is considered to be more conclusive than the results of an observational study because of that idea of missing confounders in an observational study. One last type of study and we won't consider it much in this course but it's a big deal in certain types of epidemiology and it's worth discussing and certainly some big public health decisions have been made based on the results from such studies, are case-control studies. So, in the previously discussed prospective cohort studies, both randomized and observational, the subjects had their exposure status assigned to them or were selected and then the exposure status was classified. The outcome of interest was assessed over time after the exposure had occurred. In situations in which researchers wish to study exposures associated with rare outcomes, it is not necessarily feasible to do a prospective cohort study. Such an approach would require a very large number of enrollees in order to see any outcomes over time in the samples being compared. So, a useful alternative approach to a cohort study in this scenario is called a case-control study. In the case-control study design, enrollees are selected on whether they have the outcome or not, usually a rare disease, and a sample is created based on a group that has the outcome and a group that does not and then the exposures on each individual in the study are assessed. So, one of the biggest analytical issues with case-controls is that we can't directly estimate the risk of the outcome under study in the population from which the sample was drawn. This is because the researchers actually set the prevalence or risk in our sample based on the number of cases and respective number of controls they designed. So, every comparison of risks between subgroups in our sample will be influenced by the researcher's choice. So for example, if for every single case the researcher chooses one control, then the risk or proportion of persons who have the outcome in our sample, the proportion in our cases is 50 percent by researcher design. If the researcher chose two controls for every case, then the researcher set risk of being a case in the sample would be 33 percent or one in three. So, every estimate of risk in subgroups that we make among our cases-controls, like smokers compared to non-smokers, will be influenced by how the researcher chose the cases and controls. So, one example of a landmark public health study that was done was a case-control design was the first study to show an association between lung cancer and smoking cigarettes. This is from the Doll R and Hill seminal article published in the British Medical Journal in 1950. So the method of investigation was as follows: twenty London hospitals were asked to cooperate by notifying all patients admitted to them with carcinoma of the lung, so notifying the researchers about that. Then for each lung-carcinoma patient visited at the hospitals, the almoners at the hospital were instructed to interview a patient of the same sex within the same five-year age group and this would be a patient who did not have lung cancer. So for each person or each case, they were trying to find a comparable control in terms of biological sex and age. What they found at the end of this article, and we'll delve down and we won't do much with case-control in this class but we will use this to illustrate what we can do analytically in the context of the study design. What they ultimately found was that they said, "Consideration has been given to the possibility that the results could have been produced by the selection of an unsuitable group of control patients, group that systematically different from the cases, by patients with respiratory disease exaggerating their smoking habits or by bias on the part of the interviewers." In the article, they give reasons for excluding all these possibilities and the conclusion which we know quite well by now is that smoking has been realized as an important factor in the cause of carcinoma of the lung or lung cancer. So what are the challenges with regard to analyzing case-control studies? Well, just like we had with observational cohort studies, potential confounders are also an issue with case-control studies that the cases and controls may differ systematically on other factors besides the exposure of interest. Those with lung cancer may differ than those without lung cancer in factors other than smoking and certainly we can adjust for those statistically but the nagging question is, "What confounders have not been addressed?" There's also something called recall bias. Depending on how exposures are assessed, sometimes it's done through medical records but a lot of times is done through patient interview and cases and controls may recall exposures differently. If somebody is now diagnosed with lung cancer, they may pay more attention or exaggerate their smoking history compared to somebody who doesn't have lung cancer. These exposures are assessed after they have occurred, sometimes a very long time after the occurrence, so respondent memory can also be an issue. So, we've talked about three major types of study designs. Two of which are prospective cohort and these include; randomized trials and observation. That'll be the focus of most of what we do in the course. Then we talked about that one other study design type that has some prominence in epidemiological studies, the case-control. In the first two study design types, the cohort studies, the exposure precedes the outcome in time. The cohort initially consists of people who haven't had the outcome of interest and they are followed over a fixed period of time to see who develops it. In the case-control study, they are chosen based on whether they have the outcome or not. The biggest issue with non-randomized studies has to do with that term we brought up called confounder, and we will formally dig in and put this in detail in the second quarter of this course, but this is what we talked about being systematic differences between the unexposed and exposed groups that may also be related to the outcome. So, we always want to consider the study design when taking our statistical results and turning them into scientific conclusions and we'll pay attention to that idea throughout the course.