In this session we come back to the problem of proving properties about programs. Where before, we covered just lists, we will now look at more job data structures, Maybe trees. Remember in the intro to this course, I told you that functional programming is important because it's very close to the mathematical theories of data structures. We put that to a test. Now we will develop such a theory for integer sets and prove an implementation correct with respect to the theory. As was the case in the previous proving sessions, the material in this session is optional for the online class. If you're a student of the Life EPFL class, you should follow the material, because it might be relevant for the exam. But we can generalize the structural induction principal for lists to arbitrate these structures. The principal then becomes the following. We want to prove a property P of T for all trees of a certain type What we need to do to do that, is we need to show that P of L holds for all leaves types of the tree. And for each type of internal node T which has let's say sub-trees as one, two SN, we need to show that under the assumption that PS1 and PSN all the sub, all the sub-trees satisfy the predicate than P of T holds. So let's use this proof technique to show some interesting facts about IntSet. We call our definition of IntSets from previous sessions. We had an abstract class IntSet with operation include and contains, and then we had two different implementations. Once was, one was an object empty and the other was the class nonempty, and there was an invariant. We assume then that was that the elements in a tree were ordered. That means that the left subtree of any nonempty tree contained elements that were smaller, or smaller than the current element. And the right subtree contained elements that were larger. And our implementations of contains and include made use of that invariant. So we would like to prove that implementation of IntSet correct. But what does it even mean? What do we mean by, by proving the correctness of an IntSet implementation? Well, one way to define correctness would be to define some laws that our implementation should just, should satisfy, and then prove that the implementation indeed does that. So in the case of IntSet, what laws could we come up with? The first law says the empty set does not contain any, any element so empty contains x is always false. The second law says that if we add an element x to a set, an arbitrary set and then ask whether the set contains x, then we are certain that we will get back true. And the third law says that if we add an element x to a set and then ask whether the set contains some other element y. Then the answer is the same thing as simply S contains Y. So, it didn't matter the fact whether we added X or not. The answer will be invariant under that. In fact, one can show that these three laws completely characterize what it is, to be an n set. So, the, we have now an algebraic specification of IntSet which is complete. But it still remains how to prove these laws. So, let's start with the first one. Empty contains x equal false. Well that one is actually easy because that's a direct consequence of the definition of contains and empty. Have a quick look at it, so here you see empty contains any element would give us false. The second proposition says that if we include X and S and then ask, whether the set contains X, we would get true. And that we can prove by a structure induction on the set S. The base case would be the set S is empty, so we are left with the expression, empty include X contains X. Now, empty include X, we know what that is, by the rule of empty.include. That would give us a nonempty set X with two empty subsets, and we ask whether that one contains X. And, the answer here is true because of the clause of, contains in a nonempty set where we know that if we ask for the element at the top of the tree then the answer is True. We can compare to the implementation of nonempty to verify that. So that was the base case. What about the induction step? So the induction step would be that we have a tree, call it nonempty. With free element zet. L and R and we have to proof the proposition that include x and contain x is true for each of these trees. We actually have two cases here. We could have the case that the Z is the same as the X, or that the Z, the element of the nonempty, is different from the X. Let's take these two cases turn by turn. From the first case, I assume that the Z equals six, so I'm left with tree nonempty XLR, and have to show that include X contains X equals true. So what can we do in this case? Well, we can look at what's the definition of include. If we look that up then we find that including an element to a tree that already has that element at the root is the original tree. So this expression here. Would simplify to that one here. And then looking up the contain operation, we, we find that asking contain. On a tree that contains that element at the root would give you back true. So the whole expression simplifies to true. So that handles the case where we, we're left with a non-empty tree, and the root element X of the tree was the same element as the one we included, and contains check. What if the root element is different? There again we have two choices. Either the root element is smaller than our element x or it's larger. So let's look at the case where it is smaller. So we would, we have a, a tree a non empty y, l, r include x, we ask whether it contains x and we would like it to return true. So, the. By the definition of nonempty include, we, we can rewrite this term to this one here. Why? Well, because we know that X is greater than the root element Y, so we would have a recursive include at the, on the right hand side of the tree. Okay, let's look at contents now. Again by the same reasoning, we would have that contents test of a tree like that, would translate into a contents test of its right sub-tree. So that would be root include X contains X. And now we can apply the induction hypothesis, which says for all sub-trees I assume that the property is proven, so I am left with true. There's a third induction step to do where now the root element of the tree y is greater than x, but this one is completely analogous to the previous one so I'm going to omit it. Now, let's prove the third proposition. That proposition reads that. Xs include Y contains X is the same as XS contains X provided X and Y are different. So, if X and Y are different, it makes no difference whether I add Y to the set and ask whether and, it contains a given element, X, or whether I ask the set directly. And to proof again would be by structure induction. So assume first that the element that we add is smaller than the element we test for. The dual case where the element we add is larger is completely analogous so we don't need to do both cases. The base case, then, would be that the set is empty, so we include an element y into an empty set, and then we ask whether it contains x. And to show is that that's actually the same as asking the empty set whether it contains x directly. So, empty include Y, gives us nonempty Y, empty, empty. Asking whether that contains X, gives us empty contains X. So, more precisely we go in the right subtree, because that's where the X is bigger than Y, so that's the empty here. And that concludes the proposition. That's what we needed to show. Now we have to do the inductive step. So the inductive step is a tree, non empty. With some root node Z, and a subtree L, and a subtree R. And unfortunately, the five different cases to consider. So the first case is that the root of the tree is, is, is the same as the node X. Second one is, it's the same as Y. The third one is, it's smaller than both Y and X. The fourth one is, it's between Y and X. And the fifth one is, it's larger than both Y and X. So let's look at some of these cases in turn. The first two cases are easy. Lets first assume that the root of our tree is x. So we have this expression here. Non empty XLR include y contains x. So if we include y in a tree like that, then what happens is that we actually go to the left sub tree and include y here. Because by assumption y is smaller than x. So we ask whether that tree contains x, and here the answer is obviously yes because the tree contains already x at the root. So by the definition of non-empty contains we get back true. What we wanted. The second case would be that the root of the tree is the same as y, and if we look at the right-hand side non-empty xlr contains x. Then by the same reasoning that one is also true. So the equation is established. The second easy case is where the root of the tree is the same as Y. So now we include Y in a tree that already has root Y and that of course is the same as the original tree. That doesn't change anything and that again is what we wanted. So now we come to the more difficult cases. The first case is that we are left with the three non-empty ZLR, where set is smaller than Y and X. And in that case we'd need to show again that, that expression here is the same as just non-empty ZLR contains X. So what can we do here? Well, again we apply the law of non-empty include to conclude that yes, we have to include the element Y to the right sub-tree, because Y is greater than Z. Then we, apply the definition of contains to conclude in turn that, yes, we have to look at the right sub tree. Because X is also greater than that. And then we can apply the induction hypothesis to say, R include Y contains X is the same as R contains X. Because we assume the theorem to be already proven for R. And that, in fact, is the same as non-empty ZLR contains x, because if we simplify that expression, we see that because x is greater than z, we look again at the right sub-tree r, so again we have established the equality. The next case is where z is now between y and x so we have the same situation as before but the value of z now is between y and x. So what we do in this case here is that including y into the tree here, we go to the left of tree because Y is smaller than that. Asking the contents. We go to the right of tree because x, x is larger than set so we look actually we include and we test in different subtrees. So we're left with r contains x. And that actually is already the same as non empty set ZLR contains x by the definition of non empty contains work backwards. Because again, for this tree here, we look, again in the right subtree. So we've see that, in this case here, we've established the equa-, equality without resorting to the induction hypotheses. Because the inclusion and the test fell into different subtrees. So the third case is where that is larger than both y and x. And that's actually a complete dual of the third case where that was smaller than both y and x. So I have written down the proof here, but I will not go into the details one by one. These are all the cases, so the proposition is established, so this proof was quite involved, but on the other hand we were also showing something quite significant. Namely the correctness of a non trivial implementation of sets of binary trees. I would argue that the complexity of the purely functional equational proofs often compare favorably, with what you would have to do in an imperative language. If you haven't had enough of proving yet. Here's an exercise for you which is, in fact, quite hard. I come back to the question of adding union to IntSet So here's a way to do it, which is actually a bit more efficient than the first solution that I've shown you in the worksheet. So we would have, the union operation of the empty set is, of course, the other set that we add to union. And then union of a nonempty set would be defined like this. We take the left. Sub-tree, we union it with the, the right sub-tree, unioned with the other set, and finally include x in, into the resulting tree at the end. So what I would like you to do is, to prove the correctness of union, which is translated into the following law. What you would like to have is that if we take the union of two sets, and we then ask whether it contains an arbitrary element x, that this is equivalent to asking whether either x has contain x or y is contain x, so both sides should be true and false for the same sets and for the same elements. The task then is to show this proposition by using structure induction on XS.