[SOUND] [MUSIC] Now, before I go on to talk about the similarity or dissimilarity between objects, first, we need to consider the dissimilarity or the similarity between specific individual attributes of objects. So here, I'm going to define a notation. Let's say we've got a collection of objects, x sub i, where i runs from 1 up until N, where N is the total number of objects that we have. And now let's say that each of these objects has a number p of attributes. If you want to think of a specific example, you might think of the objects, xi, as being microarray profiles. And we might have a certain number N of these microarray profiles. But each microarray profile details the expression level of a very large number, p of genes. Now, in terms of our notation, the value of the jth attribute and the for that ith object is x sub ij. So this would be, for example, the expression level of g and j in sample i. Then in terms of this notation, the dissimilarity of the similarity, the dissimilarity between the jth attribute in between two objects is written like so. And so the dissimilarity is d, j indicates the attribute that we're talking about. And the xij and xi prime j, the two attribute values between the two objects, i and i prime. Now the choice of this function d depends on the specific nature of the attributes. So the attributes could be, for example, they could be real numbers such as gene expression levels. Or they could be categorical variables, so that might seem a little bit abstract maybe. So to give you kind of a visual picture of this idea, I have this little figure here. Now, the two objects, let's consider two objects. The first object, x1, is on the left, and the second object, x2, is on the right and they're represented as columns of squares. Now, each one of these squares indicates a specific attribute for the object. So we can see we have two objects, but we have six attributes. And we're going to think about what is the dissimilarity between this indicate here. What is the dissimilarity between the second attribute between these two objects, and this is written as d2(x1, x2). So, you might like to think here, if this is the expression level of gene two. And we're looking at some dissimilarity between the expression level in the two microarray samples. In the specific choice of this function, d, the dissimilarity between attributes, depends on the nature of the attributes. If the attributes are quantitative real-valued variables such as expression levels of genes. Then the simplest most natural choice for the dissimilarity between the attribute values is simply the absolute difference between the values. Which is to say, subtract one value from the other and then drop the sign. But then, this doesn't help us with categorical variables. Now, a categorical variable is something that has no natural ordering. It's a variable that might describe for example a subtype of breast cancer or a certain level of patient stratification for example. And so these categorical variables are best described, instead of with numbers, as simple discrete values such as indicated here with K1, K2, and so on. Up to Kn, where Kn is the number of possible categories available. Now in this case, we can't subtract these because they're not numbers. So a very common choice is to define the dissimilarity between classes as, such that, the dissimilarity between the same class is zero, and the dissimilarity between classes, but not the same, is one. So, kind of, just a binary choice these indicate. For example, the dissimilarity between K1 and K2 would be 1, but the dissimilarity between K1 and K1 would be 0. So we've looked at how do we measure the dissimilarity between attributes, between objects. We've not so far considered how do we measure the dissimilarity between objects themselves. But this is defined in terms of the dissimilarity between each of the individual attributes. Now the most common definition for the object dissimilarity is essentially the sum of the dissimilarities for each of the attributes. Very often, this is weighted such that different attributes contribute more or less to the overall measure of dissimilarity. This is to allow particular attributes to have a greater influence over the perceived object's dissimilarity. This, for example, can be useful where particular attributes are more useful in discriminating between groupings of the objects than other attributes. An alternative to this are the Minkowski or the lq distances. And these are defined in a similar way, in the kind of sum of the attributes over the attribute dissimilarities. But in this case, the attributes are real values, and they're raised to a power of q, and then the sum is taken, and then this is raised to the power of an inverse q. Now the typical values for q are 1 or 2. And what this does is by choosing larger values of q, you allow larger absolute differences to have a great influence over the perceived object dissimilarity. Another very common choice is simply the standardized correlation between attribute values. In this figure, I tried to give you kind of a visual picture of this process of aggregated object attribute dissimilarities to get a measure of attribute dissimilarities to get a measure of object dissimilarity. So, just like as before, that's considered two objects, one on the left and one on the right, with a number of attributes down the rows. And we can consider the dissimilarity between each individual attribute between these two objects. Then we can aggregate these, perhaps in a weighted fashion, as indicated here with the wj, and we can take the sum. And this gives us a single value which quantifies the dissimilarity between object one and object two in terms of each of their attributes. So far we've considered ways in which we can quantify the similarity or the dissimilarity between objects. And this is going to form the basis for algorithms that will cluster our objects. So what this means is that we're going to find an algorithm which is going to take each of our objects, i, and assign it to a specific cluster, C(i). So C(i) is the map from the object to the cluster to which the object belongs. Now, even for a moderately sized data set, there is an astronomical number of possible partitions of the data. There is a tremendous number of ways in which we can assign each object to a different cluster. So it's very rarely feasible to explore all of the possible clustering partitions and choose the best one. So clustering algorithms take different approaches in order to reduce the space that needs to be searched to find a solution to the clustering problem. Broadly speaking, there are two approaches, partitioning and hierarchical clustering. The partitioning methods attempt to find an optimal partitioning of the data into clusters. Hierarchical clustering takes a different approach which finds a whole tree of partitions such that the lowest levels of the tree contain individual objects. And in the root of the tree, at the very top, contains all the objects. In this talk, I'm going to describe a specific example from each of these two groups of approaches.