Introduction
Does your lab04 code work? …Are you sure? In this lab, we will use a really cool tool to determine if your code from lab04 actually works or not. Don’t worry! We won’t grade your lab04 until after lab05 is due.
Setup
There is nothing to clone for this lab, because you’ll just be working in your lab04 repository. All your work for this lab should go in fuzz.c
.
What is “fuzzing”?
A “fuzzer” is an automated tool that tries random inputs on your program in an attempt to make it fail. These inputs tend to start out completely random, but they narrow down as
the fuzzer determines features of your code. For example, in this lab, your fuzzer will quickly determine that *
is an interesting character and attempt to insert stars in many
places in the random string.
In this lab, your fuzzer will attempt to find strings that make a reference implementation of grep
output different results than your implementation of match
. Additionally,
it will try to find inputs that cause asan
to fail. This is very powerful, because it is extremely unlikely that the fuzzer will miss an entire case of outputs.
Step 1: Writing a reference implementation function for match
in C
The first step to writing a fuzzer is to have a reference implementation. To do this, we will send a command to the shell and capture its output. Then, we will assert that the correct
output matches the reference output. Specifically, you want to write a function void check(bool actual, char *pattern, char *text);
which sends the command
echo "text" | grep -c "^pattern$"
to the popen
function. We outline a little more detail below:
- Create a buffer large enough to hold the concatenated command.
- Concatenate the pieces of the command together into your buffer.
- Use
popen
to make aFILE *
which will contain a single character (either0
(if it didn’t match) or1
(if it did match)). - Read in the single character and check it against the provided
actual
boolean
. Note that the character you read in will be the ASCII character ‘0’ or ‘1’, not a boolean. - Use
pclose
to close theFILE *
.
Task 0.
Write the check
function to our specification.
Step 2: Eliminate edge cases we don’t care about
Because the fuzzer doesn’t know anything about what strings mean, it might generate invalid regular expressions (or strings that otherwise will confuse our program).
In this step, you will write a filter (a function called is_valid
) which takes in a candidate input string and filters. We will eliminate two types of strings:
- Any string which contains any character that is not a letter (capital or lowercase),
*
, or.
. - Any string which contains multiple
*
’s in a row
Your function should test for these two cases and return true
if and only if the string is “valid” by these criteria.
Task 1.
Write the is_valid
function to our specification.
Step 3: Writing an entry point
The LLVM fuzzer that we will be using is built into clang. If you compile with an extra flag, it will send inputs to the LLVMFuzzerTestOneInput
function.
This function recieves a buffer of data. It is our job to sanitize that data and split it up into a pattern and a string. The below implementation does the
trick. It’s a bit too complicated to have you write it on your own, but if you want to understand it completely, please send a message to Adam.
int LLVMFuzzerTestOneInput(uint8_t *data, size_t size) {
size_t intsize = size / sizeof(int8_t);
uint8_t *intdata = (uint8_t *)data;
if (intsize >= 2) {
size_t len1 = intdata[0];
if (size >= sizeof(uint8_t) + len1) {
size_t len2 = size - sizeof(uint8_t) - len1;
char *str = ( (char *)intdata ) + 1;
char *str1 = calloc(len1 + 1, sizeof(char));
char *str2 = calloc(len2 + 1, sizeof(char));
strncpy(str1, str, len1);
strncpy(str2, str + len1, len2);
if (is_valid(str1, len1) && is_valid(str2, len2)) {
printf("s1=%s, s2=%s\n", str1, str2);
bool result = match(str1, str2);
check(result, str1, str2);
free(str1);
free(str2);
return result;
}
free(str1);
free(str2);
}
}
return 0;
}
Task 2.
Copy this function into a file called fuzz.c
along with your is_valid
and check
functions you wrote.
Step 4: Editing the Makefile
Finally, add a rule to your Makefile
which compiles the fuzzer.
fuzzer: fuzz.c match.c
clang -g -fsanitize=address,undefined,fuzzer -fno-omit-frame-pointer $^ -o $@
Task 3.
Run the fuzzer using the command ./fuzzer -only_ascii=1
. If it crashes, the last string it outputs resulted in a failure. Let it run for
a little while before stopping it!
Getting Checked Off
Lab \(n\) will be due during lab hours for lab \(n+1\). You will need to get manually checked off for most labs as we will not be using totally automated tests. To get checked off, join the office hours queue, and we will get to you as soon as possible.