Hello. In this lecture, I wanted to talk a little bit about this notion of average-case complexity and recurrence relations. But before I do that, I also wanted to address the broader question of randomization, which is the theme of this week in algorithms and data structures. Maybe the title should be randomization in algorithms and data structures, and a couple of technical mentions for average-case complexity and recurrence relations. Let us talk more about these topics. So far, when we have looked at running time of algorithms from Week 1, we started looking at the worst-case running time as the worst possible input that someone could actually craft, and what is the running time of that. Here's an example of a function, it's not worth reading, it's just an arbitrary function but the thing is, it takes an array a of length n, and it iterates through this array in a couple of nested for loops i and j. Here, it is like an early exit condition, which can happen early. But in the worst case, if then else condition is never taken, you always take this branch so you can always craft an array in the worst-case that can take this branch. By now, you should be able to argue that these loops running time is upper bounded by O, n squared, where n is the length of the array, and it's actually lower bounded in the worst-case by Omega n squared because for every n, I can always craft an input, which will always not take this branch, but always take this branch and result in running time that's proportional to n squared combining we get Theta n squared. I hope this is all clear so far, but this is the worst-case running time of algorithms. Now, let's talk about algorithms that use randomness. Here's an example of an "algorithm" or just a function that uses randomness. What is randomness here? The idea here is, we are defining a function called geometric of p that's going to generate a geometric random variable with parameter p. What's a geometric random variable? Well, think of p as the probability that a coin you toss turns up head. Let's say there is a coin you toss and p is the probability turns up heads. Likewise, 1 minus p is the probability that when you toss the coin, it turns up tails. Imagine such a coin is given to you, and what I am doing here is, I keep flipping the coin as long as it turns up heads, but the moment it turns up tails, I stop my experiment and I simply counts starting from 0, how many flips I made. In this case, I made, this is 1, 2, 3, 4. What you would say is I've made four flips before I hit a tails, and that's the value returned by this program. The idea here is we use a statement called flip of p, which is a random number generator. In this case, with probability p, this will generate a value true and with probability 1 minus P, it will generate a value, false and interpret true as heads and false as tails. If I get true, then it's 1 plus, well recursively just run this in process again as if nothing happened, just run it again. On the other hand, if it's false, then just return 1. This is the tails case and the heads case. It's for sure a weird way of writing such a program but it's a valid program to generate a random number generator. By the way, Python has a library called random, which generates random numbers. Python has an ability to generate random numbers and you can use random.random less than or equal to p as an implementation of flip p. This is possible to implement in Python. You don't directly have a function called flip, but you can always use inbuilt functions in Python to help you generate something like this. Now the question is, what's in the worst case the running time of this function? What you would see here is that the worst case, you could argue it's infinity because you could always be flipping a coin, and you could always be returning true and since you are always returning true, you're always stuck in what seems like an infinite loop, you are constantly just calling geometric p back. In the worst case, the running time, you can say is infinity because in the worst case, the coin is being controlled by a worst enemy. Your worst enemy is controlling the coin and therefore, you're not going to terminate. You could say in the worst case, this is non-terminating and its running time is infinity. But is that really true? Now, if you assume that your coin is truly random, of course, you are not going to imagine the worst case, what you really want is some an average case like on the average, how fast does this algorithm run? How many calls to geometric does it make before it terminates? Now, let's ask this question of why randomness? We can split this into two broad categories and there are other categories, but let's flip this into two broad categories. In one category, the reason we study randomness is you have implemented an algorithm and you have all possible inputs. Let's say of some size and this could be an infinite set of inputs, and worst-case analysis remember, is always going after the worst possible input, and input that's being designed by a special adversary who knows your algorithm and is trying to make you look bad by making the algorithm run for a really, really long time. You want to make the algorithm run for a really, really long time before it finishes with an answer and you find that worst-case input, and that gives you the big-O upper bound in the B. You get the worst case complexity anyway. In many situations, you are not interested in the worst case because let's say you don't expect that kind of input to be always produced. Maybe inputs come from a truly random source. Maybe there is randomness in the inputs that an algorithm encounters and therefore, I want to check what's the average? What does the average inputs running time look like? There are algorithms which we will see in this week. There's a special algorithm called quicksort, where you will see that the difference between the worst case and the average case is actually fairly substantial. You will see that in the worst case, quicksort runs in n squared time, but as an average case, it runs in n log n time, so average case. On the average input, it runs in n log n in the worst case. There is a famous algorithm called the simplex algorithm for solving linear programs where the worst case is more like 2 to the power n but as the average case is more like n to the power 3. This is a big, big, big difference between the worst possible input and how the algorithm behaves on the average. It sometimes worth looking at the input of an algorithm as being random. Another paradigm is algorithms that actually inside them have ability to synthesize random numbers, so it generate random numbers. For example, imagine an algorithm that from time to time rolls a die, and based on the outcome of this dice, let's say, assume it's a truly random dice, based on calls to this truly random dice, it makes decisions as it runs. We will see an example. Again quicksort randomized version, where inside the algorithm, you are doing some randomization. Why would you want to do this randomization? Again, it's the same problem. You have various inputs and it may be the case that the adversary is able to know how your algorithm works intimately and craft a worst case input. But if internally your algorithm randomizes, then it's possible that it balances out each input. What happens then is, your algorithm will have different ways of computing the output, based on the outcome of the die. Some of these are going to be short, some of these are going to be efficient. Sometimes when you get unlucky, it's going to take a long amount of time, but it's going to be what we would somewhat think of as an equal opportunity offender. For each input, you would have some probability that you get unlucky and it's hopefully a low probability that you get unlucky and you choose the worst possible path, but you might have a reasonable probability that you are choosing on the average some good path. This is one reason we use randomness, which is we make the adversary unable to choose one input, which is going to be bad for the algorithm. But the price you pay for it is you have to introduce randomness in the algorithm, and as a result of that, some of the executions of the algorithm could take longer than other executions. The worst case is going to continue to be uniformly bad but the average case is going to become good, and which is why we are interested in the average case. My drawing abilities suck, but I hope you are able to appreciate why we may want to use randomness. I already explained in the previous slide why we need average case complexity. The reason we need average case complexity is the algorithm that randomness, what you will see is some of the parts are going to take long amounts of time, but with low probability. Whereas you will have plenty of paths that finish faster. The point is on the average, you want to say, okay, what's the expected amount of time on the averages algorithm takes to run? That is obtained by doing average case complexity analysis. As an example, let's look at the average case complexity of this geometric algorithm now. To do this, I'm going to introduce a new device, and I'm going to say, let t of p be this unknown average running time. I want to know, well, how do I think about t of p? There is one way you could do it. One way you could do it is you could say, if I am ending up in this path, that is I'm ending up in this case, case 1, well, the probability I end up in this case is probability p. But probability p, it's going to be one plus. But I'm calling geometry P, so this is going to take on the average t of p. I can say that t of p is with probability p, it's t of p, or with probability 1 minus p is 1. This is the relation I've written here. So by the way, it's one plus t of p because you do have one here, all right? So it's one plus t of p with probability p or one with probability one minus p. Now, since t of p is unknown, I can bring these terms to the left and I can say 1 minus p times t of p, and there's 1 minus p, and this p cancel out equals 1. Therefore you can check that t of p is 1 divided by 1 minus p. For example, if you call this function with p equals 0.25, the average running time would be 1 by 1 minus 0.25, which is roughly 1.3333, if I didn't mess up my math. This gives you an idea of how to analyze for the average case. In this case, I used a trick which is called a recurrence relation. Now, the topic of recurrence relation is a very deeply mathematical topic and I wouldn't be able to do justice to recurrence relations just yet. But I'm just going to give you that, this idea that sometimes when we analyze algorithms, when we want to know the running time, it's useful to have these recurrence relations. Now let's talk about randomness in data structures. Why do we want randomness and data structures? Again, it turns out we want randomness and data structures for the same reason, which is in many cases, that data that you want to store in a data structure is not random data. It's essentially data that comes from your worst adversary, let's say, and who wants to make a data structure look bad. So one way again to defeat it, going back to randomized algorithms is to say, let me introduce some randomness in my data structure so when I start storing the data, I'm going to make some random decisions on where to put what data, so where does data go? How is the data organized? So the idea is you introduce some randomness in your data structure. When you introduce randomness, then your adversary can no longer upfront predict what is going to happen when they run when they insert when they put this data and do these series of operations in a data structure. So it's very difficult for an adversary, if not impossible, to come up with bad examples. Now, of course, the price you pay then is again, there are certain choices of these random inputs, of this random through of dye or toss of coin that the data structure does that's going to result in better performance, whereas certain choices are going to result in worse performance and the trade off is, you will always try and design these cleverly so that the average case performance is good. That's what is important when you do randomization data structures. All right, so with that, all of that out of the way, let me get to the last topic a little bit more on recurrence relations. Now it turns out we've already seen a little bit of these recurrence relations before when we analyze the running time for merge sort. if you recall in Merge Sort, if you're asked to merge sort an array, then you say, if the length of the array a is less than or equal to one, and this is called the base case, then there's nothing to be done. On the other hand, you divide your array into two parts. You merge, sort each of those two parts, and these are roughly going to be of size n by two and you do merge sort on each of these two parts and finally, you merge the result. This is going to run in terms of the size of b_1 plus size of b_2, in this case, n by 2 plus n by 2. So this is going to take Theta n time to run. So if you wanted to write a recurrence, you would say, let t of n be the worst-case running time of merge sort on input of size n and then you would say t of n is twice t of n by 2 because you're going to have two recursive calls on arrays of size n of n over 2 and their worst-case running time is t_n over 2 because, by assumption, t is a function of n is the worst-case running time on input of size n and this theta of n is the extra book, and this is the work that you do for merge and all of these other bits of work, all of this other bits of work will all be absorbed in that theta. Now, that is the case if n is greater than 1, otherwise, if T of n equals 1. This is called a recurrence relation. If I can solve for this recurrence, then I get to know what is the running time of merge sort. How do I solve for this recurrence? Well, it's going to be a rather long topic, and I will give you a brief idea. But let me take a more general example. Here, I have this generic algorithm on some input array. It does some processing steps on some new arrays, which are going to be smaller than the input array, and it's going to return the result. There is a base case if the size of the array is less than some constant M, then you get into the base case. Typically, the running time of this algorithm would look like T of n is some a, T, n over b plus f of n. What is idea here? This is if n is greater than M, capital M. What's idea here? This a is the number of recursive calls. In this case, you are making a calls back to the original algorithm. The size of the array for each of these calls is n divided by b, and f of n is all this extra work that you are doing in these processing steps is some function of n. Otherwise, you say T of n is constant for the base case if n less than equal to M. This is a generic form of a recurrence relation for algorithms of this form. Merge sort isn't one of the algorithms of this form. How do you solve such a recurrence? Well, it turns out that there are many methods for solving this recurrence. One simple method is called the expansion method. You can say T of n is twice T n over 2, plus some C times n. Let's just call theta of n, C times n. It's not quite C times n, it's plus C times n plus D. Let's say, C times n plus D. Let's put that as Theta of n. This in turn expands. This is twice of T, n over 4, plus C times n over 2, plus D. That's this expanding this inner term, plus C times n, plus d. When this settles, now you have 4 times T, n over 4, plus, now this becomes Cn, plus this becomes Cn, and then this becomes 2d plus d. Let me just ignore d or set d equals 0. Just for convenience, you can do the analysis with the full D later on. This is nothing but 4T, n over 4, plus 2Cn. When you do it once more, you will get 8T, n over 8, plus 3Cn. When you do it one more time, you'd get 16T, n over 16, plus 4Cn. This process terminates when you get to some n over power of 2. The process terminates when you get to some n over 2 to the power K, which is 1, or K is log to the base 2, n. Therefore, you would get to all the way continue this process, you would get to log n, sorry, not quite log n, but more like 2 to the power log n, T of n divided by 2 to the power log n, plus this will become log n times Cn; C is a constant, n is a number, so this is 1. This is just n, so this is just n, plus Cn log n. This n gets dominated by n log n, so this is Theta n log n. I went through really fast with this derivation, it's not that important for now, but note that these recurrences can be solved in many cases, and they will give you answers that you expect to see already. I won't go any deeper into solving recurrences at this point. It's not that critical at this point of this class. We might come into it later when we really need to understand more of these recurrences. Note that in your book, there is something very nice called the master method, which tells you how to solve a large class of recurrences of this form, A of T n by b, plus f n. It gives you three cases you can think about. You compute this quantity called epsilon, which is log to the base b of a. Then, you split into three cases. In each of these three cases, it gives you the solution for your recurrence. I'm not going to go any further in this. I just wanted to mention that the master method exists, and you can use the master method to solve recurrences. Though it's not going to be a deep topic, we are going to go any deeper just now in this class. Hopefully, you have an idea of this whole space of randomized algorithms. I made a brief mention about average case complexity, and we talked about analyzing recurrences using master method. Thank you very much. I'll see you at the next lecture. Bye.