The k-means algorithm requires as one of its inputs, k, the number of clusters you want it to find, but how do you decide how many clusters to used. Do you want two clusters or three clusters of five clusters or 10 clusters? Let's take a look. For a lot of clustering problems, the right value of K is truly ambiguous. If I were to show different people the same data set and ask, how many clusters do you see? There will definitely be people that will say, it looks like there are two distinct clusters and they will be right. There would also be others that will see actually four distinct clusters. They would also be right. Because clustering is unsupervised learning algorithm you're not given the quote right answers in the form of specific labels to try to replicate. There are lots of applications where the data itself does not give a clear indicator for how many clusters there are in it. I think it truly is ambiguous if this data has two or four, or maybe three clusters. If you take say, the red one here and the two blue ones here say. If you look at the academic literature on K-means, there are a few techniques to try to automatically choose the number of clusters to use for a certain application. I'll briefly mention one here that you may see others refer to, although I had to say, I personally do not use this method myself. But one way to try to choose the value of K is called the elbow method and what that does is you would run K-means with a variety of values of K and plot the cost function or the distortion function J as a function of the number of clusters. What you find is that when you have very few clusters, say one cluster, the distortion function or the cost function J will be high and as you increase the number of clusters, it will go down, maybe as follows. One method for trying to choose the number of clusters is called the elbow method. What that does is look at the cost function as a function of the number of clusters and see if there's a bend in the curve and we call that an elbow and if the curve looks like this, you say, well, it looks like the cost function is decreasing rapidly until we get to three clusters but the decrease is more slowly after that. Let's choose K equals 3 and this is called an elbow, by the way, because think of it as analogous to that's your hand and that's your elbow over here. Plotting the cost function as a function of K could help, it could help you gain some insight. I personally hardly ever use the the elbow method myself to choose the right number of clusters because I think for a lot of applications, the right number of clusters is truly ambiguous and you find that a lot of cost functions look like this with just decreases smoothly and it doesn't have a clear elbow by wish you could use to pick the value of K. By the way, one technique that does not work is to choose K so as to minimize the cost function J because doing so would cause you to almost always just choose the largest possible value of K because having more clusters will pretty much always reduce the cost function J. Choosing K to minimize the cost function J is not a good technique. How do you choose the value of K and practice? Often you're running K-means in order to get clusters to use for some later or some downstream purpose. That is, you're going to take the clusters and do something with those clusters. What I usually do and what I recommend you do is to evaluate K-means based on how well it performs for that later downstream purpose. Let me illustrate to the example of t-shirt sizing. One thing you could do is run K-means on this data set to find the clusters, in which case you may find clusters like that and this would be how you size your small, medium, and large t-shirts, but how many t-shirt sizes should there be? Well, it's ambiguous. If you were to also run K-means with five clusters, you might get clusters that look like this. This will let shoe size t-shirts according to extra small, small, medium, large, and extra large. Both of these are completely valid and completely fine groupings of the data into clusters, but whether you want to use three clusters or five clusters can now be decided based on what makes sense for your t-shirt business. Does a trade-off between how well the t-shirts will fit, depending on whether you have three sizes or five sizes, but there will be extra costs as well associated with manufacturing and shipping five types of t-shirts instead of three different types of t-shirts. What I would do in this case is to run K-means with K equals 3 and K equals 5 and then look at these two solutions to see based on the trade-off between fits of t-shirts with more sizes, results in better fit versus the extra cost of making more t-shirts where making fewer t-shirts is simpler and less expensive to try to decide what makes sense for the t-shirt business. When you get to the programming exercise, you also see there an application of K-means to image compression. This is actually one of the most fun visual examples of K-means and there you see that there'll be a trade-off between the quality of the compressed image, that is, how good the image looks versus how much you can compress the image, that is the size of the image. There you see that there will be a trade-off between the quality of the compressed image. That is, how good the image looks versus how much you can compress the image to save the space. In that program exercise, you see that you can use that trade-off to maybe manually decide what's the best value of K based on how good do you want the image to look versus how large you want the compress image size to be. That's it for the K-means clustering algorithm. Congrats on learning your first unsupervised learning algorithm. You now know not just how to do supervised learning, but also unsupervised learning. I hope you also have fun with their practice lab, is actually one of the most fun exercises I know of the K-means. With that, we're ready to move on to our second unsupervised learning algorithm, which is a non-linear detection. How do you look at the data set and find unusual or anomalous things in it. This turns out to be another, one of the most commercially important applications of unsupervised learning. I've used this myself many times in many different applications. Let's go on to the next video to talk about anomaly detection.