In this lecture, we're going to talk about automating security testing using a technique called fuzz testing. So, security testing what we're trying to do is, we're trying to reveal flaws in the mechanisms used to protect data in computational resources. So, it's a regular ocurrence actually these days that people break into software using things like buffer overflows, or misconfigured SQL strings and other things, and are able to steal out information either passwords, or computational resources, or maybe they're using your computer for Bitcoin mining. So, with security testing, what we're trying to do is, we're trying to figure out where those holes are in the software and plug them. So, our goal is to determine a confidence level that we have that the software is resilient to attack. The nice thing here is that, modern security testing regimens usually include automated test generation. So, this isn't really a research area. It's a common area for practice and I'm just going to show you some of the tools that are used. However, even with current tools without proof, it's very difficult to make strong statements about security. So, even if we run lots and lots of fuzz tests, and I'll describe fuzz testing here in the next slide, we don't have complete confidence that our software is going to be secure. But nevertheless, fuzz testing is an integral part of modern software development. As one of the pundits put it, if you don't fuzz test your software, it's guaranteed that somebody else will. So, what is fuzz testing? Fuzz testing is simply running a program on many abnormal inputs, looking for exploitations of your software. So, if we run these abnormal inputs and we get errors out, that's okay. If we get assertion violations out that the input doesn't match our expectations, that's okay. But crashes of the software are not okay because those indicate that there may be exploitable bugs in the system, and privilege escalations which is a known exploitable bug are definitely not okay. So, once we have the crashes, we might see that some of these directly lead to exploitations of this system. But in order to determine whether or not these exploitations exist, we may have to instrument our software to determine whether or not certain bad things occur. So, things like buffer overflows. So, a buffer overflow is when you have some array say in code, if you write past the end of it, because the way that computer programs are structured, you might go right into an area of code and what happens if the attacker can write past the end, is they might write their own program codes into that area, and then later on that area of the program may be executed, and then the program will do what the attacker wants not what the program originally intended. So, things like non-release of resources. So, another thing attackers try and do are denial of service attacks where they hammer the system with inputs in such a way that it starts to fail to respond and no longer responds to legitimate users. One of the ways that they do this is, if there's a certain bug in the system that causes it not to release things like files, eventually the system will lock up because you've used up all the file resources in the system. Another thing that can happen is you use invalid memory. So, somewhere along the line there's a pointer to memory that is no longer valid and the attacker can use that to take over the system. So, there are lots of things. Then, for denial of service, you can also do deadlock. So, what we're going to try and do is we're going to try and throw lots of these abnormal inputs at the system, and first thing we have to decide is well, what are the inputs to the system? So, we're going to try and look at the attack surface of the program. The attack surface is usually in terms of things like files, input strings, parameter files, network packets, you name it, anything that can be used to provide user input to the system. Oftentimes, these inputs have a structure that has to be mimicked. So, what we want is inputs that are close to but not quite valid. So, thinking about what this not quite validness means is tricky and it's difficult to do this manually. It's difficult to create a lot of tests that are close to but not quite valid by hand. So, we turn this over to tools, and let me just give you an example. So, the JPEG file format, it has to start with two bytes that are FF and D8. Why these values? I don't know. But then it has an 18 byte header that describes the size of the image, the color schemes, and other information about it, and then it has a variable length thumbnail area where you can describe a small version of the picture, and then it has several other headers. Then finally, it has the image data, and at the end, it has two bytes FF D9, which describe the the end of the image. So, if we just randomly write stuff in there. If we change the first byte for example, the system will know it's no longer a JPEG file and will probably error out immediately and not exhibit any exploitable behavior. So, we need to make subtle changes in the structure of the file so that we can find bugs deep in the parsing of these JPEG files. So, what could we do? We can do things like change values of bytes in the file. We can add or subtract 1. We can replace that byte with a random value. We can invert the bits in that byte, or we could negate the value of that by treating it as an integer. Another thing we could do is transpose values. So, we could take either single bytes or regions in the file and switch them. Another thing we could do is truncate the file. So, if it was 50 k bytes before maybe we make it 40 k bytes. The truncation oftentimes you have to preserve some region at the very end. Another thing we could do is, we could extend the file. Either we could duplicate the last values in the file, or we could add random values. Now, in order to do this well, if you think about it, the JPEG format has very specific requirements that it wants in terms of the file format. So, we're going to use some heuristics. Usually, fuzzers focus on the beginnings and ends of files, because that's where the header information is, and by making changes there, you can sometimes cause the system to incorrectly parse the information, and that may lead to an exploitable situation. The other thing we're going to do is we're going to favor less severe modifications. So we could transpose the whole file around it or truncate it to zero, but chances are that wouldn't lead to an interesting fuzzed file that the system is going to fail on. It's going to look at it and say no this isn't a JPEG. So for the most part we want our fuzzer to do things that are sort of subtle, maybe changed values by plus or minus one. And then the other thing we want to do is we want to sort of monitor when we make changes that lead to different code paths being executed in the file. We want to focus our attention there. So there are a whole bunch of heuristics that happen in order to fuzz things well. So let me show you what this looks like. Here we start with an image of a bunny and what we are seeing is the behavior of a fuzzer called American fuzzy lop. So what it's doing right now is it's trying to truncate the file. It's trying to start from a small seed that it can then use for fuzzing in different ways and then you'll start to see a bunch of random stuff and so it's very quickly trying out many different kinds of parameters and randomizations in order to try and generate something that is of interest for testing the JPEG parser and it actually turns out that after a certain amount of time, what'll happen is it will happen upon an input that satisfies the structural obligations of JPEG that causes the system to misbehave. And this goes on for a good several minutes because it's difficult to find that exact code path. But I just wanted to give you a visual idea of what's going on when we're doing this kind of fuzzing. So how do we fuzz? We've seen a bunch of examples already but we can kind of separate things into two different orthogonal dimensions. We can look at the input structure. And some fuzzers are "dumb" fuzzers which means that they start from a set of "seed" inputs that satisfy the constraints, so for example a JPEG file and then they make a whole bunch of mutations to "seed" inputs. And the nice thing about "dumb" fuzzers is they don't have to know anything about your program and they will happily go ahead and make lots of random modifications. On the other hand, "smart" fuzzers use an explicit model of what the type of input is supposed to be. So a grammar for the JPEG format for example and then what they do is they use that grammar and make small modifications. So it's something that doesn't actually satisfy the grammar but it's very close to something that does and this is necessary for sophisticated inputs. So suppose your file format has error-correcting codes at the end. Well if you make any fuzzing changes over here, those are going to be detected by the error-correcting codes and it's going to be immediately rejected by the program. So in order to properly fuzz files that have error correction in them, it's very difficult without a "smart" fuzzer. So that's one thing we can look at. So that's the input structure and then we can look at the program structure. So we could be thinking of a fuzzer as a black-box thing. So if we have a "dumb" black-box fuzzer, it's basically just random testing but we can also look at white-box fuzzing. So you can take these fuzzed inputs and run them through a static analysis tool to see what program paths are executed symbolically. There's a tool called Microsoft SAGE which was used to find all the buffer overflows in Windows 7 and Windows 10. This is the way that works. So, it basically ties some randomization tools with the static analysis tool to try and get really good coverage of code. And then there's another school that does instrumentation of the code. So you run these random fuzz tests on an instrumented program and the instrumentation tells us which code paths we took and then the fuzzer tries to pick values that will cause it to choose different code paths. So we have two different dimensions that we can use for constructing fuzzers. And there are a range of tools that fall along the spectrum. So just to recap in terms of other approaches, "dumb" black-box fuzzing is another name for random testing. We can do "dumb" instrumentation fuzzing and this is kind of like adaptive random testing. So rather than adapting the random values to choose an input value that's far away from the current input value, what we do instead is we choose the next random value based on the structure of the program that we wish to test. So for focusing in on a particular condition, we try and choose a value greater or lesser in such a way that it's going to cause that condition to change value. Then we can look at "dumb" analytical fuzzing. So in here we don't really understand the input format but we have a symbolic representation of, the path, of the program structure. And so what happens is it takes longer for us to sort of examine this path, but we know exactly what we want to do or what we need to do in order to change the condition from one value to another and so we can use that analytical data to suggest new random values to choose for fuzzing. And then over on the right side of the figure we have "smart" fuzzing where we actually have to have a lot of knowledge about the file format that we're using in order that we can direct the fuzzer to generate values that are appropriate. So these tools are really useful out of the box, but the more knowledge you have of the system, the better you're going to use them. So knowing where to fuzz and how to fuzz and how to direct the fuzzer leads to better results. So each of these fuzzing tools has a bunch of parameters that you can use to tune heuristics that they use. Using multiple fuzzers also improves bug finding. So, basically since they all do different approaches, It's easy to parallelize and you get a diversity of results. And finally the more time that you give the fuzzers, the better they do. So, some other things you can do with fuzzers kind of advanced topics is you'd like to be able to distinguish failure-causing inputs from nonfailure-causing inputs. So some of the advanced tools, what they'll do is when they find a failure, they'll try and generate a whole bunch of values that are similar to try and determine what it is about that input that causes the failure. And then once you have the input sequences, determining what's wrong with the program is still another issue. And finally, because these fuzzers can find so many bugs, one of the things that's tricky is trying to determine severity. So there was a recent use of fuzz-like tools to examine a Debian distribution. They found something like 10,000 crash bugs. Now only a small fraction of those actually lead to exploitable failures that can cause a security violation. So trying to categorize those faults is an active area of research for fuzz testing tools.