So in this first lecture set, I’d like to give you an introduction overview of what we’ll be doing in the course. And first, I’ll try and convince you, as I will at all points throughout the course, that statistics and biostatistics are an important part of public health research, practice, etc. And then we’ll talk about some ways of sampling from a larger population. Frequently in statistics, what we do is study some sub set of a larger population or process in the hopes of learning about that larger population or process, when we can observe every element of the population. And then we'll talk about some of the study design types that are encountered in the literature related to public health research and practice. And finally, we'll talk about some of the data types that we'll encounter throughout both terms of this course. So just to get you started, and this is not the last time I bring this up. As I said before, I hope that the entire course makes you think about this. But why do you need biostatistics in your life, as someone studying public health? Well, interestingly enough, a lot of people are starting to think they need statistics in their life. If you had told me 20 years ago that statistics would the popular culture, I would have been surprised. But let's look at some recent history. So back in 2009, Hal Varian, the chief Google economist, was quoted as saying I keep saying that the sexy job in the next 10 years will be statisticians, and I'm not kidding. That was followed up a couple years later by a headline in the Harvard Business Review that says Data Scientist, the sexiest job of the 21st century. And it's interesting to note that within a three-year period there were at least two statements in prominent publications that included both the word sexy and statistics. In 2009, a headline in The New York times said For Today's Graduate, Just One Word: Statistics. Well, one of the reasons people are starting to wake up and be interested in statistics, aside from the fact that it's awesome and interesting, is that data are everywhere. For example, I just went to the Washington Post on March 18th, 2017, and found this article. The headline says web-based counseling reduces blood pressure according to a new study. And then, here's a synopsis of this. Says you don't need to go to a doctor in person for lifestyle counseling that can lower your blood pressure. Online lifestyle counsel-ling works well too according to research presented Saturday at the American college of Cardiology's 66th Annual Scientific Session and Expo in Washington, DC. Systolic blood pressure, the higher number in blood pressure readings, declined more for participants in the study who received Web-based lifestyle counseling than for those who were a part of a Web-based control intervention, the study found. And here's where they quote some statistics. Over the 12-month period, systolic blood pressure of people in the Web-based lifestyle counseling group decreased by 10 millimeters of mercury. Compared to the decrease of six millimeters of mercury for the other group. So they measured the change of blood pressure in each of these groups, and showed that the change was larger for those who got the web-based counseling. The New York Times, in March 15, 2017, a headline says Canadians with cystic fibrosis live 10 years longer than Americans with the disease. Here's a short piece from the article based on journal published research. It says Canadians with cystic fibrosis survive on average more than 10 years longer than Americans with the same disease, largely because of differences in the two countries' health insurance systems, a new study suggests. Cystic fibrosis is an inherited disease that causes recurrent lung infections and other problems. The average lifespan for an American with the illness is 37 years. In Canada, it's 49 years. So what the researchers did was compare survival between Canadians with Cystic fibrosis and Americans, and look at factors influencing that survival, and control for things that may differ systematically between Americans and Canadians to see whether the differences persisted. And one of the biggest drivers of this difference they found, at least according to this article, was the difference in the country's health system. So some other headlines that caught my eye at different times, one of my favorite headlines of recent times, was a headline from the Baltimore Sun in August of 2012. It says Elmo makes apples more appealing to kinds. And so, Elmo is a Sesame Street character who is an orange puppet with a particularly high-pitched grating voice, in my opinion. But very cute nevertheless. It said Kids took nearly twice as many apples when they had Elmo stickers on them as when they didn't, researchers from Cornell Medicine Cornell University said in a letter in the August issue of the Archives of Pediatrics and Adolescent Medicine. So they measured the Elmo effect, if you will, and saw that kids took more fruit when they had Elmo stickers on them. But what they didn't tell us in that letter is whether the kids actually ate those apples. So that's an important piece of information. Here's a sobering statistic that came out in the Washington Post in August of 2009. The headline was DC to offer STD Tests in Every High School. So sexually transmitted infection tests in every high school. They did a pilot study conducted in eight high schools in the District of Columbia and found that a staggering 13% of the 3,000 students tested positive for STDs, mostly gonorrhea or chlamydia. So this is an incredibly important finding, which certainly would motivate changes in public health practice in DC schools, and provides evidence that there is an issue with STIs in the school system. So why is statistics important? Well, let's just talk about the general steps in a research project. One way to summarize them would say the first conceptualize, planning the conceptual design of the study. Going out and collecting data. Doing analysis of the data. Presenting and interpreting the data. Statistics can play a role in most of these steps, in fact, all of these steps. But sometimes unfortunately, it's only called upon for in the last parts when analyzing the data. It's really helpful to use statistics in informing you about how much data to collect and the ways in which to collect data as well. So let me just highlight that in this synopsis here. Statistics is part of the research process. So sum up, in the planning/design of studies, how can statistics help us? Well, in terms of the primary question of interest, we can think about do we want to quantify information about a single group, quantify differences between groups? How do we want to quantify those differences when we're comparing multiple groups? Then we go on to sample size. Well, how many subjects do we need in each of the groups to get a statistically precise result? How are we going to select study participants? Are they going to be randomly chosen from a master list, if there is such a thing? A master list of participants? Potential participants in a larger population? Are they going to be selected from a pool of persons interested in participating in the study? Are we going to take whoever shows up? If group comparison is of interest, how are we going to assign people to the groups? Can we do a randomized study, for example? And assign them to intervention or control, or we studying the grouping factor that people self select into? Like whether they smoke or not, or whether they live in Baltimore or another city. Data collection. It's important to do the data collection properly, and be attentive to the accuracy of the collection of the data or they can cause problems later in the study. And then, certainly the data analysis portion involves a lot of statistics. How to best summarize the information coming from the raw data. How to deal with the variability, both natural variation and sampling related variability in the data. How to distinguish real patterns from random variation in the data. And something we'll spend time on, inference. Which is using the information from a single study coupled with information about variability to make a statement about the larger population or process of interest that we cannot observe directly but can only look at a subset through samples. So what statistical methods are appropriate given the data collected? How do we summarize, etc.? Then, in presenting the results, what summary measures will best convey the mean messages in the data about the primary and secondary research questions of interest? And how to convey or rectify uncertainty in the estimates based on the data. And then finally, what do the results mean in terms of practice, the program, the population, etc.? So statistics plays a pretty big role in the research process, not just at the end. So what are our goals for the course? Well, in Term 1, we're going to focus on how to properly summarize and interpret the summary measures for different types of data. Whether we're summarizing them for single samples or comparing them between samples. And when we start comparing samples as estimates of what's going on in the populations from which the sample's come. We'll talk about how to quantify or measure differences in the results in the samples we're comparing. We'll talk about how to create intervals, or interval estimation, confidence interval estimation, and do statistical inference vis-a-vis hypothesis testing. And these two things will be complimentary. And we'll also talk about some statistical sample size considerations when designing a study. In Term 2, we'll detail the idea of adjustment for situations where our exposures of interest are not randomized. We'll talk about the concept called effect modification or statistical interaction, and how to assess that. And we'll talk about using linear, logistic, and time-to-event regression to model outcomes as the functions of potentially multiple predictors,and systematically assess things like Confounding, or adjust for Confounding, an effect modification. And throughout, and this is what I want to be our main focus, is throughout all of our endeavors, the focus will be on interpreting the results of statistical procedures correctly. Summarizing the results from published studies in an understandable fashion, and assessing the strengths and weaknesses of published research results, including the study design, the clarity of the research question, the appropriateness of the statistical methods, the clarity of the reported results, and the appropriateness of the overall scientific and substantive conclusions. So I'm very thrilled to be with you, and I look forward to working with you over the next eight weeks and beyond