In the next series of videos, we're going to work our way through a number of different kinds of figures, getting a real workout with ggplot and some associated libraries. The goal of these videos is to give you practice with making a wide variety of figures. So for these videos, and really throughout this course, we're not going to do as much emphasis on data wrangling, but we're just going to make a lot of figures and get some practice. We'll start with some variations on scatter plots. Many of you probably have some basic training in statistics, or at least you're familiar with the idea of putting a trend line through a scatter plot. This allows you to get a better sense of the relationship between two variables than what you might be able to get from just looking at a cloud of points. For example, I will use a set of survey data here that we've used in previous classes, courses from the Cooperative Congressional Election Survey. In that survey data, there are questions about the educational level of survey respondents in the United States and their political ideology. If you pay attention to US politics at all, or really to politics in many different parts of the world, you might be aware that an increased level of education is often associated with greater political liberalism. In the survey data, the variable ideo5 captures political ideology, with five being the most conservative and one being the most liberal. The educ variable captures education level with 16 being the highest and one being the lowest. So we ought to see a negative correlation between these two variables. It's hard to assess from a scatter plot what that relationship really looks like. But drawing a best fit line gives you a better sense for it. We add a best fit line to the scatter plot here with geom_smooth. There are a lot of different approaches for drawing a best fit line but the default that geom_smooth uses is the loess curve. If you're statistically inclined, you can select several different methods for drawing that line, like a linear model or a generalized linear model, and you can also change the confidence intervals drawn around the line. But be very careful with this if you're not statistically inclined, because doing statistical inference with this kind of approach is pretty dicey. Exploratory analysis, however, is fine and so if you want to draw some best fit lines, you scatter plots to get a sense for the data just sort of an intuition, that's an okay thing to do. In addition to a best fit line, another way to use scatter plots for exploratory visualization is to use a scatter plot matrix, which will show you the bivariate relationship between several different variables in your data at once. The library GGally has a handy tool for creating a scatter plot matrix. These lines of code have created some simple fake data with variable one and variable two being tightly correlated and the third variable being negatively correlated with variable one. When we put these together into a table and we feed that into the function ggpairs, it generates a scatter plot matrix for us showing the distribution of the variables, the bivariate scatter plots, and the correlation coefficients for each of the variables pairs. If you want to modify the look of the scatter plots with the density plots in a scatter plot matrix that you make like this, you can write custom functions for the plotting functions, and then you can include those in the ggpairs function using a lower and diagonal arguments. This is the way that you can control color and fill and other aspects of the visualization. A related way to dry in a plot for correlations is a shaded correlation matrix. In the ggcorrplot package, the ggcorrplot function will generate this for us. Here I'm selecting a few variables from the survey data, I calculate correlation coefficients for them and I generate the correlation plot using ggcorrplot. ggcorrplot has a number of different options within it for modifying the colors and other visual features at the plot and you can access those if you look at the help file, and you can also apply a ggtheme if you like, if you recall doing this from the previous course. Beyond summarizing bivariate relationships, you can also modify figures that you make with geom_point to compare values across units of observation, kind of like a bar plot. In fact, many professional data visualizers prefer these kinds of plots to bar charts because of their visual simplicity. One figure we're talking about here is called a cleveland dot plot, and to give you an example of this, I'll draw on the data from law about lawmaking in the US Congress that we've used in previous courses. Here, I'm selecting a sample of members from the 114th congress and what I want to do is compare their legislative effectiveness, which is summarized by the variable les in the data. Again, higher les values mean more legislative productivity for the member, so we're basically going to compare sort of how good of a law maker a set of members are by seeing how productive they are. In my ggplot function, I set the data and I do the aesthetic mapping, so to the x-axis is my outcome of interests and my y-value is the units of comparison. When I add geom_point to this, I get a single point analogous to the top of a bar chart. These next lines, I'm doing modifications to the theme function to drop some of the white vertical lines and adding some dashed horizontal lines, and I also changed the point size, the labels, and I reorder the units on the y-axis by the value of the outcome variable in descending order. With those modifications, we get a nice refined minimal plot for comparison. Finally, a close-related variation of this dot plot is called a lollipop plot, try saying that three times fast, and it's even more similar to a bar plot. To make a lollipop figure, I flipped the y and x-axis in my aesthetic mapping and I used geom_points, but rather than drop the grid-lines, I use geom underscore segment to draw lines that connect from the x-axis at the y value of zero so that the bottom of that, to the y value for the variable of interest. So going up for each category across. Again, this is just the very refined simple way of making a bar plot essentially.