My name is Daniel Angulo, and today we're going to talk about interrupted time series analysis. In this lecture, we will reproduce the interrupted time series tutorials results, from the London School of Hygiene and tropical medicine. We will also discuss the typical data analysis process and which parts you would jump in as a policy analysis. As Dr. Lance said in a previous lecture, the data set contains information about the number of acute coronary episodes in Sicily Italy per month before and after a population level health intervention. Our goal is to evaluate the effectiveness of such intervention. As you can see I am using an Rscript because it is what I usually do as a data analysis and statistician. I'm using a library called Pacman that loads, which means installs and calls the package into the R script using the P load function. I like it better than writing library and then the name of the package because I feel it saves me time. I don't necessarily use all the packages at once but they are already loaded in case I need them. Since I am working with R script instead of an arc marked down to run the code. I'm going to hit Ctrl + Enter into each code line. We can see that it ran without any issues because in the console there are no errors or warning messages. After loading all of the packages I read the data set and conduct an exploratory analysis. I'm using the read.csv function because the file I'm using is a csv file. One of the reasons that I enjoy the most while working with art is the facility to visualize data files into the environment. It only takes a click. So for example I can click into the data in the environment panel right here and observe the data file. We can see that we have 59 observations and seven different variables. We have the year, month, aces which is the number of acute coronary episodes. The time, which is a variable that goes from 1 to 59. A smoke band which is our intervention variable which is coded 0 for the period before the intervention, and 1 for the period after the intervention, the population and the standardized population. To achieve our goal which is to determine the effectiveness of the intervention, we first create a variable for the standardized rate of acute coronary episodes per 100,000 people. I'm using mutate to create a new variable called standardized rate, that is, the ratio of of two variables in the dataset, multiplied by 100,000. We consider the code run without any problems because if I open the data set one more time, there's one new variable that it's called a standardized rate. We will first create some visualizations, then some statistics and at the end, we will run a model and assess the results. Since we have multiple observations over time, we can create a scatter plot of the standardized rate of acute coronary episodes over time and assess visually whether there is a difference in the tendency before and after the intervention. I start by piping the data set and then created a ggplot object, mapping into the x axis, the date, and in the y axis this standardized rate. I'm also using geom point because it is a scatter plot. I use theme_bw, because it creates outstanding aesthetics that are easy on the eye. However, I'm using the function theme to overwrite some aspects of the title. I'm also using labs to modify the title and the x axis. Additionally, to create a linear tendency line I'm using geom smooth. And finally, I created a lide great rectangle to indicate the post-intervention period using geom recked. We run the code, we can see that there's no issues because we have the plot. For me created a plot is an iterative process. I start with the basics piping the data set and creating the mapping and then work from there. I build the plot layer by layer. Every time I add or change something, I run the code and see the plot and modified it until I'm satisfied with the results. So if we see the plot right here, the blue line represents the the predicted trend based on the unadjusted linear regression model. The white background represents the pre-intervention period, and the gray background the post intervention period. The plot shows that most points in the post-intervention period are below the trend line rather than distributed randomly across the line. The latter will imply a significant decrease in the acute coronary episodes after the intervention. From this point on the analysis is in charge of the statistician, giving the study design the available data and the hypothesis. The statisticians will perform some statistical models to test the validity of the hypothesis. Before digging into a statistical method appropriate for this particularly studio design, we will test if the mean of acute coronary episodes before and after the intervention is significantly different. I will use tbl_summary which is a function from the package gtsummary to customize a summary table of descriptive statistics. I find it especially useful when creating a table 1, which is usually the descriptive statistics of the study population while writing a paper. First, I pipe the data and then select the variables I want to display on the table. Then I use the tbl_summary function to create a statistics by the intervention barrel. You can always remember that you can use question mark and then the name of the function. So in my case tbl_summary to find out more about what type of things you can change or modify in this function. We will run the code right now. And we can see that in the viewer panel we have a table very similar to the tables, one that you will find in any paper. For every variable, we have the overall mean and in parenthesis, the standard deviation and then we also have mean standard deviation for the post intervention period and the pre intervention period. And finally we have the p value comparing pre and post intervention. We can see given that the p value using a significant level of 5%. We can see that the number of acute coronary episodes is significantly different between the pre and the post in Intervention period. However, when we account for the standardized population we don't see any difference at a 5% significance level.. Because the intervention has a clear cutoff point and the outcome is short term, we will perform an interrupted time series analysis using a poison regression. The model uses the following structure. The outcome variable is the number of acute coronary episodes in each time. And we use as variables the time elapsed since since the start of the study and a virally variable for the pre imposed intervention period. If you want to take a deeper look into the modeling plus process, we can refer you to the paper, we will run the model and use the summary function to assess the results. The results of this particular model appears in the console. So right here we can see for the intervention variable First that the p value is less than 5% which is usually usually the significance level that we use. Therefore it's a significant variable and also we can see that the estimate is negative which means that the for the post Intervention period there's a decrease in approximately 11%. The model shows that there is a strong evidence because the P value is less than 0. 1 of a reduction in the number of acute coronary episodes following the smoking. But with an average decrease of 11%. We save the model predictions and then plot the same scattered plot as before. But instead of using a linear trend we will use the model predictions here. I'm using the money symbol to create a new variable instead of using the mutate function. However, you could use the muted function here as you can see I'm using mostly the same code as before. The only difference is that instead of using GMS mood, I'm using the model prediction as a secondary set of points to display. So first I pipe the data, then I create the ggplot and I used two geom points. One for the real data and one for the predicted data and then the rest of the plot is basically the same. I used theme_bw. I use labs to change the title and the access I override some aspects of the team. Notice that in the new plot we can observe the 11% reduction in the standardized rate. So in the pre intervention period we can assess that the trend line goes here. However, in the post intervention period there is a decrease and the line starts around here and goes from there. Today, we work through an example of assessing the effectiveness of population level intervention. We use some compelling visuals to see the phenomenon and quantify the effectiveness using some statistical models. Sometimes these models are hard to understand. However, that is the beauty of working in an inter inter disciplinary team, interacted with statisticians and people from various backgrounds will be an essential part of your role.