So as we said at the very beginning, there's two main families of graphical models. There's those that are based on directed graphs, directed acyclic graphs and those that are based on undirected graphs. The undirected graphical models are typically called Markov networks, they're also called Markov random field. We're going to start by talking about the simplest subclass of those which is pairwise Markov networks and then we're going to generalize it. So let's look again at a toy example just to illustrate what's going on. This is an example of four people who are studying together in study pairs. And notice that Alice and Charles don't get along. And, you know, Bob and Debbie had a bad breakup so they don't talk to each other either. And so really, we only have the study pairs that are marked by the edges on this diagram. And, what, we are, and the random variables here indicate whether the students have a particular misconception because the material was a little confusing so the random variable says, does the student have a misconception or not this particular misconception, and the, the intuition here says that if two students study together then they kind of influence each other and so we have so for example if Alice and Bob study together, then this edge indicates. that one, if one of them has the misconception, the other ones likely to have the misconception. Now, this doesn't fit neatly into the perview of a directed graph, because the influence flows in both directions, and so you can't really point an arrow from Alice to Bob or from Bob to Alice and so we're going to use an undirected graph to represent this. That's great but how do you parametrize an undirected graph, because you no longer have the notion of a conditional probability distribution because there's no variable that's condition in one that you condition on? And so and so how do we do this? And we're, so we're going to use the general notion of a factor which we defined previously. And notice that this really is a general factor in that the numbers don't even are not even within the range 01. Now what do these factors mean? These factors have many names. They're called infinity functions or compatibility functions. there of, they're also called soft constraints in different settings. So what these numbers mean is the, is the local happiness of the variable A and B to take a particular joint assignment. So here we have the, you know we can see that the happiest assignment as far as A and B are concerned in isolation of everything else, is a A0B0. The way this is the case where neither student has the misconception, that's the happy assignment. We see that the second happiest assignment is A1B1 where again the students agree, and in this case they both have the misconception. And finally, the other two in the middle are are the least happy of all. Now this is a local happiness and we have similar notions of happiness for for the other pairs in the, in the graph. So in this case we see not only that there is a strong sentiment in favor of agreement, it's much stronger than in the AB case. So B and C really like to agree with each other. They, you know, it's very difficult for them to have opposing opinions, okay? On the other hand Charles and Debbie like to argue with each other all the time. And so, if one of them says that, it's going to rain today, the other one is going to say that it's sunny today. And so really you can see that the preferred assignments for their local opinion is the one that they disagree with each other. Okay? And again, A and D like to agree. So this is sort of a, you know describing a, the overall state by a bunch of little pieces and how we go to put these pieces together to define a joint probability distribution, we are going to use the notion of product of factors and so here we, here we are and we're going to take all these factors and we're going to multiply them together. That's great. Except that there is, this is in no way, shape, or form, a probability distribution. Because its numbers aren't even in the interval, 01 which is why you'll notice there is a little tilda on top of a p. This tilda formalized measure, okay? So how do we turn an unnormalized measure into a probability distribution? Well, we normalize it. actually sorry, before that here is the unnormalized measure so that you can see it here this just to sort of highlight the point. How do we this unnormalized measure into probability distribution? We normalize it and that's normalization here. It has a name. It's called the partition function for historical reasons that it's comes from its origins and statistical physics and I'm not even going to describe why it's called that, but that's what it's called. But you can think of it simply as the normalizing constant that is going to make all of these sum to one. So we're going to get it by simply summing up all these entries and it's going to give us the value z. And if we divide all of these entries by z, we get a normalized probability distribution and that is the probability distribution as defined by this graph. Now, now let's think about what these factors mean, and let's think about this factor phi one of AB, which is this local happiness between A and B. And let's think about how it relates to the probability distribution. So we might think that, this is the marginal probability of A and B in the joint distribution, or maybe it's the conditional distribution of A given B, or maybe B given A, or maybe it's the joint probability of A and B given C or D. The answer is none of the above. Let's go back and look at what this actually mean in this particular context. So here we have the set of factors that we used to construct this distribution. And here, trust me, is the marginal marginal probability of A and D as defined by the set of factors, phi. I'm going to use a little phi here, to denote the fact that it was derived from the set of factors phi. Equals phi one [INAUDIBLE] [SOUND] Oh, come on. Well, phi again. phi one, two. Let's compare this distribution to the factor phi one. We can see that not only does this not respect the fact that a and b like to agree with each other. here A and B like to agree with each other by a lot. I mean, remember, this is the, this is three times higher than the next highest value. Here, that probability has 0.13 and the other high value assignment in, on this side, the one that had 10, has only 0.04. What is the single highest assignment here? It's this one. Let's think about why that is. This probability distribution is constructed by multiplying all four of these factors, and if you think about what's going on here, you see that B really, really likes to agree with C, so these guys are really closely tied together. And, actually this should probably be and A and D similarly like to agree. So these are really, really closely tied together. These guys, C and D strongly like to disagree. It's nice to have opposite values. Now all three of these factors are actually stronger, that is the differences between the assignments are bigger than in phi one. So where are you going to break the cycle? You can't have D agreeing with A, A agreeing with B, B agreeing with C and C disagreeing with D. It doesn't work. And so somewhere this cycle has to be, this loop has to be broken, and the place where it gets broken is A and B because it's the weaker factor. So the A and B probability is actually some kind of complicated aggregate of these different factors that are used to compose the Markov network. And this is actually an important point, because it is going to come back and haunt us in later parts of the course. There isn't a natural mapping between the probability distribution and factors that are used to compose it. You can look at the probability distribution and say ha. This piece of it is what's [INAUDIBLE] off the beat. This in direct contrast to Bayesian network, where the. Actors were all conditional propabilities and you could just look at the distribution and compute them, here you can't do that and that often turns out to affect things like how we can learn, these, facts for from data, because you can't just extract them directly from the propability distribution. So, with that definition we can, with that intuition we can now go ahead and define a peerwise Markov network. And I'm defining it explicitly, because peerwise Markov networks are sufficiently, commonly used as a class of general Markov networks. That that it's worth giving them their own place. So a peerwise Markov network. Is an undirected graph, who loads other random variables. x11) up to xn.n). And we have edge, xi connecting to xj. And each one of them is associated with a factor. Also move into potential. Phi IJ. Oops. Xi [INAUDIBLE]. Okay? That's what. this wouldn't be an edge. This would be a comma. Mm-hm. That's a peerwise Markov network. And from that. and here's an example of a slightly larger markup network. This is a markup network that is in the form of a grid. and we, this is the kind of network that's used for example when we're doing various operations on images, because then the variables correspond to pixels for example. and this is the Markov network that corresponds to the image segmentation when we'ere using super pixels in which case it's no longer a regular grid.