Up to this point, we've been considering supervised learning algorithms. But now we want to look at clustering. Clustering is an unsupervised learning task and there are a couple of different algorithms that we'll be considering. The first is called k-means clustering. With k-means clustering, the goal, of course, is for the algorithm to group similar data examples together. As we do this, we can discover patterns within the data and relationships between those data elements. Now, the thing that the Data Scientists has control over here with k-means clustering is the value of k, which is the number of clusters that you would like to have created. The clusters then are simply created based on finding the data examples that have the most statistical similarity with respect to the features that are included in the model. If you look at the slide, you'll see that here we have an example of k-means clustering, where we have a number of data examples represented by all of the triangles and circles and exes and what k-means does, is the first step in the process is it will randomly assign a centroid for each of the clusters that you would like to create. If we said that our k value is equal to three, then we're going to create three clusters. In this case, three centroids will be assigned randomly within the feature space. We have three different centroids that have been randomly assigned and then, of course, the data examples scattered around behind that. After the centroids have been assigned, then all of the data examples that are closest to that centroid will be assigned to belong to that centroids cluster. Then the mean of all of those data examples will be calculated and the centroid will be moved to the new center of those data examples. Once again, the closest data examples will be assigned to that new location of the centroid and this process continues as the centroid continues to move to the mean of the examples within its cluster. Eventually, we will reach a point of convergence where the centroids no longer move and there's no longer any changes in the assignment of data examples to the appropriate clusters. We can either limit how long our k-means will operate by limiting the number of iterations will perform, or we can actually keep going until we reach this convergence. In some cases, because the centroids initially are randomly assigned, the resulting clustering may not be optimal, so simply by starting again with new random locations for those centroids, we may be able to create a better model. Or we could adjust the number of clusters we are creating by adjusting the k value. The calculation that is used for k-means clustering would look like this, where we basically are going to take the position of our data example and our centroids and find that distance and then we're going to find the closest elements and these are going to be assigned to those clusters. But this type of approach, if it were applied globally, would be next to impossible to complete. Even if we had as few as 25 data examples and four clusters, that would result in approximately 47 trillion possible assignments. You can see that if you're dealing with any number of clusters or data of any size, global optimization is really not possible and that is why we use the local optimization process that instead of calculating the absolute best location for the centroids, we simply start with a random location. We start with the number of clusters we want, we randomly assign the centroids are at least our algorithm does and then we assign the example to the closest centroid. We move the centroid into the center of those data examples that have been assigned to it and then we keep going either until we reach the maximum number of iterations or we have achieved convergence. How do you come up with the right number for the number of clusters? Well, usually the best solution to identify how many clusters you should have, what the value for k should be is knowledge of the domain. If you happen to perhaps know a little bit about the problem you're trying to solve and the data that's involved, then you may be able to identify how many clusters you're looking for. For example, maybe you're trying to discover the demographics of individuals, like three different flavors of ice cream, chocolate, vanilla, and strawberry. Well, because your goal is going to be to try to figure out which combination of characteristics are drawn to each of those flavors, then because there are three flavors, you would likely set k to three because that is going to group or cluster your data examples together into three groupings, which is what your goal would be. There are other mechanisms that can help us to estimate k and we'll discuss those later.