Now I'd like to talk to you about Partition Testing, which is sort of a systematic study of how we select good tests for different requirements, or how we choose values when we look at white-box test metrics as well. So, partition testing, oftentimes when we're looking at programs the input domains are huge. Like the domain of all integers, or the domain of floating point numberss, and once you have a program that has several of those inputs, it's no longer possible to exhaustively test all the values. So, what we want to do is we want to try and create partitions of this space of inputs in such a way that we're likely to rigorously test our application. And what we will do is partition a set S, and the set S is all the values of all the inputs. And we want to partition it in such a way that S is a collection of subsystems or subsets, such that the union of all the subsets is equal to S, so we don't miss any values, and there are no overlapping values in multiple partitions. So the idea is we're going to figure out a good way to carve up this space in such a way that any value we take out of one of these partitions is equally good as the other values in that partition. So that basically we can choose a representative, and that one will stand in as a representative for all of the values in that partition. And the idea is that if we choose our partitions well, then we're likely to sample from a partition that's faulty and a partition that's good and in this way we're going to, if the partition has described the faulty regions, we're going to find the faults by choosing a value from each partition. So the underlying concept is that we assume that failures are sparse in some parts of the input space and dense in others. So if you were to look at this graph as this base of all inputs, what our hypothesis is that, in some portions of that space you're likely to find a bug, and in others there's no bugs. So, these pink lines that are on the screen are the way that we thought about carving up this space. And in this case, actually we've come up with a good partitioning that's likely to find faults. So, in one case, it's guaranteed to find a fault in the bottom center part of that figure. Any value we choose is going to be faulty, so we are going to find a fault from that. And then the portion of the screen that's center near the top, it's very likely that since three out of the four values in that partition lead to faults, it's very likely that the one we choose is going to also lead to a fault. And rather than testing all the values of which there are many, we have I think around 11 or 12 partitions in here. So we're only going to need to choose 11 or 12 values to test. And if our hypothesis is accurate, that we've partitioned the values into equivalence classes in terms of faults, then we're still going to preserve a lot of the fault finding that we had initially. So let's look at a concrete example. We have the United States Postal Service, and you can look up a zip code and it will tell you all the cities that exist in that zip code. So what do you think some good partitions of the input space might be? Where do you think errors might manifest themselves when you look at different partitions? So I would like you think about that for a minute. So, some places that I think might be reasonable places for errors, first of all, zip codes have a range from 0 to 999999, and so if the text box allows you to type in negative numbers, that's a good partition to test. Integers generally in Java have both positive and negative values, and so one partition that's likely to be faulty is if you provide incorrect values to the system. Another one that you could check is the zip code 0, and the zip code 999999. Those are boundary values, and those boundary values we know programmers are likely to make mistakes at. So those are other partitions. Once we get into the middle, now we have to start thinking in more depth about the domain. So if you're thinking about zip codes, what things would have unique features in terms of the outputs? So we're thinking of the outputs as a list of cities. Well, it could be that certain zip codes don't have any cities in them, because the only things in that zip code are towns. So that might be a good partition, is the set of zip codes that don't have any cities. Another one might be zip codes that have an especially large number of cities. That might be a partition to consider, because maybe the output is limited to the number of cities that you can produce and maybe it might miss some. So, these partitions that you construct unfortunately are domain-dependent. You have to kind of understand both the structure of programs and also the structure of the domain you're working in in order to come up with good partitions. And then of course you should have a partition for kind of a normal zip code. So that's how I would do this, but there's no right or wrong answer in this case. So the idea is again that we're trying to find the portions of the input space where errors are likely to be dense. In most places errors are likely to be sparse, so the program wouldn't work at all. So as a tester you're trying to hone in on those parts of the input space where you're likely to find errors, and ideally we would make these partitions using our godlike knowledge of the domain in such a way that we knew if we selected any value from that partition we're going to reveal a fault. But of course, in reality, we don't know the domain that well, and we don't necessarily know that, even if we think we know the domain, the program is going to correspond to our knowledge. So we use heuristics, and we use experience to try and come up with good partitionings for our input values. And the other thing we do is quasi-partitioning. Rather than require that the partitions are disjoint, we usually allow overlapping partitions. I mean, cause in a way it doesn't matter. We just want to be able to sort of find the faulty areas of a program and they may overlap for different reasons. So we're just trying to increase the likelihood of finding partitions that are dense in failures. And what we've done, kind of informally and what we're talking about formally here is just ways of carving up the input space such that we can choose a value out of there and it's either likely to find a fault and so are the other values in that partition, or it's unlikely to find a fault and so are the other values in that partition. So we have another example I'd like you to consider. Let's look at the string reverse() function. This is about as simple as a function can be. You take a string argument andit returns a string with the order of characters reversed. So do we have a complete specification here? Mostly, mostly we do. But there are a couple of things especially if you're working in a language like Java that you don't know. So, first we don't know how we terminate a string. So, do we stop reversing when we hit a null character which is the way that C works when you're looking at strings? Or do we look at the length of the string which may be longer, may go beyond this null character, which is the way that Java works? And the other thing we don't really know is, is the reverse() function destructive? So when we reverse a string, do we still have the original string and we made a copy of it, or is the original string lost because we've sort of reordered the bits in it so that we have a new string? And so given this problem what are some representative values or classes of values to test? So hopefully that was instructive. And just to recap, what we have is we have a vast sea of possible program inputs. There's far too many for us to test them all. And so whether we think of it this way or we just come up with values sort of informally, we're creating partitions. We're choosing values that we think are likely to be illustrative of the behavior of the program, and so thinking about it a little bit systematically is a good thing. And when we're talking about partition testing, we're making it explicit. I believe that there are going to be faults in this class of inputs, but not in this class, and I'm going to choose representative values from each and see what the tests reveal. And the idea behind this is that errors are not uniformly distributed across the input space, they're concentrated in certain portions. And if we can figure out what those portions are and partition the input space appropriately then we're likely to find the faults. But determining those partitions requires skill and domain expertise, and it's never exact.