Now, we're going to move on to the task of joining our two data sets together. When we speak about joining of data, we're talking about taking two data sets and matching some rose observations based on some values in those rows. While the deep layer norms to call the data sets X, and Y, it's actually more traditional to think of them as a data frame on the left, and a data frame on the right. And then we choose adjoining strategy based on a shared variable in each. The joining functionality and deeply are similar to that of a relational database, so, if you know that this is all going to be familiar. The key consideration and joining data frames, is what to do when some values in your joint column. In this example of garbage can identify either don't exist in one data set or exist multiple times in a data set. Because of this, there are actually four different kinds of joining functions available to us. The first is the inner joint, where all of the rows in both the left, and the right data frames, are kept and joined together. So, if data is missing, it's just filled with any values for one side, if a key exists two or more times on one side, then it's joined two or more times. Let's take a look at how this would work, okay, let's imagine that we have a little data frame about the people teaching this course in the specialization. I'm going to call this left, we've got Chris brooks, Paula Lance, Alton Worthington and we've got the schools are associated with information, public policy, and public policy. Now, let's imagine that we have another data frame, which are the core development team which helped put this course together. These people are amazing contributors to this and other courses in the specialization, so, I'm happy to recognize their efforts with this little example. So, we've got a numerous people involved of course the instructors, but we also have Abby, Emily, Matt, Ahmed, and Daniella, some of whom you've gotten a chance to meet. And then I put the rolls here, and I just put some last initials, but I put some rolls here as well from media learning, and coordination, and let's take a look at those two data frames. So, this is the left data frame, you can see that it's got Chris Paula Alton, and it's got our schools, and then this is the data frame on the right, and it has Chris and Paula still in it. But then we've got Abby, Emily Matt, Ahmed and Daniella okay, the inner joint is going to retain all keys which exist in both the left, and the right hand data frames. In this case, first name uniquely identifies people, is the one that I'd like to use to match on, the only first name values which are in both data frames, are Chris and Paula. So, when we match on this, we're going to get a single joint data frame, with two rows. We see a couple of things here, first the second name variable is repeated, because it actually exists with the same name in both data frames, the left, and the right. Doesn't want to throw away data, so, it's going to add a suffix to each column to indicate where it came from, if disambiguation, is needed. One way we could suppress this is actually to join on the combination of names, since this is natural with a first, and last name. So, just like group by, we include a vector of the values that we want to join in. Now, this is an and operation, so you could think of D plier actually emerging all of these columns down into one giant string on either side, and then joining them. Now, a left joint keeps all rows in the left hand side argument, and only those rows that match in the right hand side. For all values which only existed in the left hand side, the right hand side will be filled with NA values. So, for instance, if we do a left joint and we pass in the left and the right data frames, and we're going to look at first name, and second name. So, for instance if we do a left joint and we pass in the left and the right data frames, and we want to look at joining on the first name and the second name. We're only actually going to end up keeping Alton in the final data, along with Chris, and Paula. Now, you can see in this case that Alton's rules set to NA, because it's data from the right hand side data frame, that doesn't exist. The right joint just does the opposite, keeping data in the right hand side argument, and data from the left hand only if it matches. So, in this case Alton's going to be gone, but everybody else is going to come in and Chris and Paula will continue to be there. In practice, it's very rare to see anybody actually do a write joint, because you just flip the arguments and get the same effect with the left joint, and we tend to teach left first. So, functionally this is actually the same to the previous, you see here the column ordering is different but that's about it. But in the world of the tidyverse where piping is column, there's actually a little bit more usage of right joints, so I thought you should be aware of it. Now, the full joint keeps all of the columns, from the left, and the right and fills in NA values on both sides, as appropriate. So, this will get us a list of absolutely everybody involved, we can see everybody is included here, with Alton getting an NA value for the role. While Emily, Matt, and others get an NA for the school variables