Next, we're going to look at dynamic programming which is a very important programming paradigm that's widely used in operations research and other fields. We're not going to be able to do much other than give an example and show our relationship to recursion, but the relationship to recursion is very important, and having an idea of what dynamic programming is, is very important for any programmer. One way we're going to think of it is as an alternative to recursion that avoids re-computation in an organized way. In recursion you think of the computation going from the top down. We start with the big problem and break it into smaller problems. The alternative that dynamic programming represents is to work on the computation from the bottom up. Do the small cases first, and then combine them to solve the small subproblems, and then save the solutions and use those solutions to build bigger solutions. This was pioneered by Richard Bellman in the 1950's and it's very effective in a broad variety of domains. So, in the context of fibonacci numbers, computing the fibonacci numbers, what this means is just use an array to hold your answers. So, we're going to save solutions of small sub-problems, set the first two entries in the array to what they ought to be, and then just go through the array for each value of I using the previous computed subproblems to compute the answer. It's as simple as that. Solve the small subproblems, use the ones you've already solved to build bigger and bigger solutions. Again, this one obviously doesn't have the exponential waste pitfall. It computes the fibonacci numbers and just an additions or whatever it takes. So, that's another quick way to get the fibonacci numbers. Now in this case, this computation is much simpler than the recursive one, and there are some cases where recursive solutions involving memoization are simpler but people who apply dynamic programming to scientific problems find that the organized use of solve small subproblems is a natural way to approach many problems. The key advantage over the recursive solution is that you're pretty sure that you're addressing each subproblem only once. So, we're going to look at example called the longest common subsequence of problem. This is a problem that first thought you might think would be extremely difficult to compute. So, let me introduce it. So, sub-sequence, we have a string s, and a sub-sequence of the string is any string that you can get by just deleting characters from s and packing the list together. So, if we have the string gcaccacg, so, cac is a sub-sequence. Well, any substring is a sub-sequence. You just delete the other characters. But you can delete some character and you might think about how does this one going to be. Well, we delete the second character we have gc and then a and then we delete the two Cs and we get another a, and then that's how we get that sub sequence, and so forth. There are a lot of sub sequences. In fact, you can maybe lay over a binary number on each character to say whether you're deleting it or not. That means that there's two to the n subsequences in a string of length n. So, just for these examples, here's the proofs that those are sub-sequences. You pick some subset of the letters and put them together, that's a sub-sequence. Here's another string. So, to get five g's, you got to put the five g's together as a sub-sequence and so forth. Again, here's the proof that those are sub-sequences. So, you got two strings, you have a huge number of sub-sequences of the strings, exponential number for both strings, and the computational problem involves what's called the longest common subsequences. What's the longest string, that's a sub-sequence of both those strings. In this case, it's this one ggcaacg. So, our goal is to find an efficient algorithm to compute the LCS. Actually, we're just going to do the length of the LCS but it's not much more difficult to get the actual sequence. Now, again you would think with all those possible sub sequences to consider that this would be a computationally intractable problem, and it's quite surprising that it's actually fairly easy to solve. It's important there's numerous scientific applications where this measure of the similarity of two strings is important. With DNA sequences, we don't always get them exactly the way we want them, and this measure is a very important way to know whether the relationship between different sequences, and there's many, many other applications. There's also computational applications like how many edits would it take to get from one string to the other, and many other related problems. So, now how are we going to solve this problem? We're going to use dynamic programming. So, actually, we're just going to do the length just to simplify it for the lecture. So, and the approach we're going to use is work with the end of the strings, and remember with dynamic programming what we want to do is solve small subproblems, and then put those solutions together to solve bigger ones. It's recursive, but from the bottom up. So, what we're going to do is keep track of the length of the longest common sub-sequence of the tails of the two strings, and we're going to have a two-dimensional array opt [i,j] that gives the longest common subsequences of the tail of s starting at i and the tail of T starting at j. It turns out there are just three cases to consider as long as we have already computed the answer for smaller strings. So, well the first one is easy, if either one of the strings are empty, then they have no common subsequence because one of the strings has nothing in it. So, for at the end of either one, then our subproblem of size zero, the answer is zero, that's the length of the longest common subsequence. Otherwise, a case to consider is if S of I equals T of J, so that's the character at the front of the two tails if those are equal. In that case, if we've already computed the answer for the strings, the tails without those characters, then we're going to have a longer subsequence, because we can add this new character that matches and get a longer subsequence. So just as an example, if i equals six and J equals seven, then the tail of S for our example is ACG and the tail of T for our example is ATACG. So the longest common subsequence without the A's at the front is just CG, so that means the longest common subsequence with the A's at the front is ACG, it's as simple as that. That's if the characters are equal. If the characters are not equal, well then adding those two different characters at the front, there's only two possibilities, one of them could give us a longer common subsequence or the other one, and that's just compute the maximum. So again, from our example, say I equals six and J equals four. So now we have two tails where the leading characters are different. So now we've already computed the answer for all small tails, so we know that if we forget the C on the second string then we know the LCS is already computed, and the length of N in this case is three. So now we know the one if we delete the first character of the other string, that's also already computed and they may be the same length or one of them's longer. In either case, that any subsequence of the smaller strings is also a subsequence of the slightly bigger string, so we take the maximum of that and that's the length of the longest common subsequence of the strings that we're interested in, those of the only three cases. So all we have to do is arrange the order of the computation so that we're sure that we already have computed the answer to the smaller subproblems for the next problem that we want to consider. So we set up an array, we've got our string T up here at the top and our string S over there at the left, and so at the ends we have all zeros and all we do is work backwards through this array. Well, so since both strings end in a G, the longest common subsequence, when we're only considering the one G at the end of S is one, it doesn't get interesting until we get to this case here, and that's a case where both of the tails start with C and so the length of the longest common subsequence is, if you take off both those C's which is the one diagonal one down from that and add one. In the next case is, the other case where their leading characters are different and then we looked at the one to the right and the one below which are already computed and take the larger one. So all we do is proceed through the array from bottom to top, from right to left. Every time we need an answer, it's either one that's diagonally below or the maximum of one to the right and one to below, and we've already computed those. These are the numbers that you get for this string, these pair of strings and the final result is that the length of the longest common subsequence is seven and we can recover the sub-sequence itself with a similar computation. Here's the implementation. It's not recursive, it's based on saving all the answers in a two-dimensional array and this is just in Java code, exactly the process that I just described. Now we go backwards through the arrays, if the characters are equal, we compute I J from I plus one J plus one. Otherwise, we take the max and then when we're done we just print out zero zero. If you run this program for our examples, you get seven. Amazingly, simple solution to this problem that seemed to have been comparing exponential or a number things against one another, very persuasive example of the power of dynamic programming. If you want to see the code that actually prints the LCS itself, you can look at LCS.java on the Booksite. So, dynamic programming recursion are not toys, they're broadly useful approaches to solving problems. In both cases, you're combining solutions to smaller subproblems. In one case, you do all the small problems and combine them to do bigger ones, that's dynamic programming and the other case, you take your big problem and break it down into little ones. You need to learn both of these because they represent new modes of thinking and give us powerful programming paradigms to solve difficult problems. They both give us some real insight into the nature of computation and had been successfully used for decades in all kinds of scientific applications. Advantages of recursion is it's pretty easy often to figure out how to decompose and it's very easy to reason about correctness. On the other hand, dynamic programming, you don't have to worry about exponential waste. You could often have a recursive solution with memorization but it's hard to get much simpler, for an example like we just did for longest common subsequence. Pitfalls and recursion, as we've talked about, is exponential waste might be there and it's not always necessarily so simple to figure out how to decompose them. For dynamic programming, you need that big array to save all the subproblems that doesn't work if you have real valued arguments so well. Sometimes it's challenging to determine the order of computation but definitely need to consider both of these approaches, and you'll encounter them over and over again as you take more advanced courses in computing