Over course of the last unit that we've covered in the class we've discussed a range of different inference methods, some exact, some approximate sometimes solving the problem of computing marginals, sometimes for computing map assignments. And the question is if we're faced with a particular task that we're trying to deal with and we've designed a graphical model which inference algorithm should we use. So what are some of the choices and what is some guidance that we can offer about what which of methods we want to use. The first question we should answer for ourselves is which task is more suitable for our problem? Are we trying to solve the problem of computing a map assignment, or are we trying to solve the problem of computing marginals? And these offer sometimes not obvious trade-offs in terms of what's right for our application. [SOUND] So, computing marginals by and large is a less fragile way to go, because we don't pick just a single hypothesis. We compute a probability distribution over a range of different options and we can see for example, is our top option a little bit more likely than the second best or a lot more likely than the second best? And. That gives us some guidance as to how robust our inferences are. that also gives us some amount of confidence that we can provide in are answers and maybe give to the user as guidance in terms of how much to trust thee the answers produced by the system. And as we've also seen computing probabilities is important for supporting decision making once we need to integrate for example, with the utility model. On the other hand, map is suitable in its own set of contexts. for example, if we're trying to compute a coherent joint assignment for example, in the context of a speech recognition problem or an image segmentation problem. And sometimes for important that we get something that makes sense as a whole than then we get the best possible answers to individual pieces of the inference problem like individual pixels or individual foams. but in a way that doesn't make at in the context of of the larger problem. And so the notion of coherence is sometimes important and is worth the trade-off of reducing the robustness. We've also seen that from computational perspective map has a range of model classes that are more tractable than in the case of computing marginals. So for computational reasons we might sometimes adopt the use of a map solution just for improved efficiency. And we've also seen that, especially when using approximate inference, the map assignment often provides us with some theoretical guarantees in terms of how close our answers are to the correct answers for example, in the context of dual decomposition we've seen that, where as it's difficult. We get a similar level of confidence in terms of our, the accuracy of approximate inference for competing marginals. So on the other hand, when running a proximate inference, there are again different trade offs for these two classes of problems. So, in when computing marginals one often in gets for approximate inference algorithm the fact that errors just by being soft but just by having soft assignments, errors often are kind of tenuated. And, the source of inaccuracy in one part of inference is often doesn't make it's way significantly into other parts of the model. And so you often get more robust answers in approximate inference, when competing marginals. On the other hand, on the map side, one can at least gauge in many cases, at least in some algorithms such as dual decomposition, whether the algorithm is working or not by looking at the value of an assignment that we get from the algorithm. So, again, somewhat different tradeoffs. And in some applications, people actually try both and see which one works better in terms of the performance and the downstream tasks that one cares about. [SOUND] So what are some algorithms that we've discussed for doing marginal inference. So first is just good old exact inference say, variable elimination and clique tree, and by and large if the problem is small enough that exact inference fits in memory. In terms of the sizes of the cleats and the subsets, it's probably good to use exact difference. You run into less problems this way and if it fits in memory, it's probably going to be very fast. If one is not in this fortunate situation, we've discussed. Different types of algorithms that are approximate. there's the range of algorithm's due message passing over the graph, of which loopy message passing is one of the more common ones. But not the only one. And then there is the class of sampling methods that sample from the distribution. And these are different categories of algorithms. And frankly, it's often difficult to tell in advance which of them is going to work better for a given class of models. and you talk a little bit about factors that might favor one of these algorithms over another in a subsequent slide in a couple minutes. What about algorithms for map? Once again, we have algorithms that perform exact inference and in this case its actually a broader spectrum then in the case of computing marginals. So in addition, to cases where we have low tree width, which is the category for which marginal algorithms work we've also seen other examples such as those with associative or regular potentials and multiple other cases. And once again if you can perform exact inference that's really often the best way to go. you're · going to, it, it's often going to give you the best performance if it's tractable. Then there's the class of methods that are based on optimization. And those can be exact methods which puts us in this category, but more often we're going to find ourselves in the case where we have to use some kind of approximate methods such as a duley composition algorithm that we've talked about/ And those methods often as we've said lend themselves to some kind of at least being able to estimate the performance of our algorithm relative to the optimum answer even if we can't get to the optimum answer. Finally, there is the range of algorithms that perform search over the space. And this can be simple hill climbing, search which is is a fairly straightforward application of standard search methodologies, but also it's, is quite common to use some kind of sampling. Like, Markov Chain Monte Carlo sampling to explore a range of different assignments in the space, and then select among those the ones that have the highest, or the one that has the highest score or the highest yeah the highest log probability. And that is quite commonly used technique, it doesn't provide the same set of guarantees that you might get from the optimization methods. But it's often very easy to implement and so's quite frequently used. So if we're resorting to the use of approximate indifference. What are some of the issues that might make our lives more complicated or might favor one algorithm versus the other? So one complication has to do with a connectivity structure. Just how densely connected the model is. And by and large, the more densely connected the model is, the worse it is for message-passing algorithms. So, message-passing algorithms don't like densely connected models because the messages are propagated over very short cycles, and that can cause both issues with convergence. As well as with ac, with lack of accuracy. So, lack of convergence and lack of accuracy. Sampling methods are less effective by the connectivity structure. The second com complicating factor is the strength of inference. That is the extent to which nodes that are connected to each other have tight coupling in terms of the preference for certain combinations of values. In general, the stronger the influence, the harder it is for both categories of oh of algorithms because it creates strong coupling between variables. Which can complicate both message passing algorithms as well as sampling algorithms, because for example if you think about plain gib sampling it makes it very difficult to move away from the current configuration to a different one. Strength of influence becomes an even worse problem when the influences go in different directions. So for example, if you have loops where one path is tending to make these variables take on one configuration. And a different path is trying to make them take on a different comm combination of configurations. So, for example, as we saw in the misconception that where one path wants a pair of variables to agree on their values and the other wants them to disagree on their values. That really does create significant problems for both classes of methods. And the reason for that, and this I think is at the heart of what makes approximate inference hard, is when you think about the shape of the likelihood function, if you have multiple peaks in the likelihood function that makes life difficult for most approximate algorithms and the sharper these peaks the more complicated things get. And so that is and, and multiple peaks are generated by strong influences that go in opposite direction. And so that's really where a lot of the complexity and approximate inference comes from. Okay, so now what assuming you have a model that has these problematic these problematic issues. Well so how do we address them? First, is to look at the network and identify the problem regions, that is subset of variables that are tightly coupled and maybe are subject to opposing influences. And then we try and think of how we might make the inference in these problem regions. More exact, so if here we have some tightly coupled region with maybe opposing influences. How do we prevent our approximate inference algorithm from Falling into the trap of, of, of this particular area being giving imprecise answers or, or lack of convergence? So for example, for doing cluster graph methods we can put this entire problem region in a cluster. It costs us something on the computational side because we have to deal with larger clusters. But it might be well worth it in terms of the improvement of performance. In sampling methods, you might consider having proposal moves over multiple variables. So, instead of sampling individual variables, we can sample this entire block. Using again, a somewhat more expensive sampling procedure, but again that might end up being well worth while in terms of the overall performance profile of the algorithm. And finally, if we're thinking of this in the context of a math problem, we can put this entire set of variables in a single slave, which again, costs us something, on the computational side, but can significantly improve the convergence profile of the algorithm. So really when we're faced with a problematic model, one that doesn't immediately succumb to traditional inference techniques, it turns out that the design of inference is often a bit of an art. That is we need to study our model in depth, think about how to deal with different pieces of it, and which inference method is best equipped to handle the different pieces. And we often find that complicated models. You can get the best performance by a combination of different inference algorithms, where some pieces are handled using exact inference, such as these larger plots, in the context perhaps of an approximate inference method such as sampling or belief propagation. And so one needs to think creatively about the inference problem in the context of these more problematic models.