CS 3 (Spring 2021) Lab 05: Fuzzing

This lab will focus on a novel way of testing code that uses automated tools.

Introduction

Does your lab04 code work? …Are you sure? In this lab, we will use a really cool tool to determine if your code from lab04 actually works or not. Don’t worry! We won’t grade your lab04 until after lab05 is due.

Setup

There is nothing to clone for this lab, because you’ll just be working in your lab04 repository. All your work for this lab should go in fuzz.c.

What is “fuzzing”?

A “fuzzer” is an automated tool that tries random inputs on your program in an attempt to make it fail. These inputs tend to start out completely random, but they narrow down as the fuzzer determines features of your code. For example, in this lab, your fuzzer will quickly determine that * is an interesting character and attempt to insert stars in many places in the random string.

In this lab, your fuzzer will attempt to find strings that make a reference implementation of grep output different results than your implementation of match. Additionally, it will try to find inputs that cause asan to fail. This is very powerful, because it is extremely unlikely that the fuzzer will miss an entire case of outputs.

Step 1: Writing a reference implementation function for match in C

The first step to writing a fuzzer is to have a reference implementation. To do this, we will send a command to the shell and capture its output. Then, we will assert that the correct output matches the reference output. Specifically, you want to write a function void check(bool actual, char *pattern, char *text); which sends the command

echo "text" | grep -c "^pattern$" to the popen function. We outline a little more detail below:

  1. Create a buffer large enough to hold the concatenated command.
  2. Concatenate the pieces of the command together into your buffer.
  3. Use popen to make a FILE * which will contain a single character (either 0 (if it didn’t match) or 1 (if it did match)).
  4. Read in the single character and check it against the provided actual boolean. Note that the character you read in will be the ASCII character ‘0’ or ‘1’, not a boolean.
  5. Use pclose to close the FILE *.

Step 2: Eliminate edge cases we don’t care about

Because the fuzzer doesn’t know anything about what strings mean, it might generate invalid regular expressions (or strings that otherwise will confuse our program). In this step, you will write a filter (a function called is_valid) which takes in a candidate input string and filters. We will eliminate two types of strings:

  1. Any string which contains any character that is not a letter (capital or lowercase), *, or ..
  2. Any string which contains multiple *’s in a row

Your function should test for these two cases and return true if and only if the string is “valid” by these criteria.

Step 3: Writing an entry point

The LLVM fuzzer that we will be using is built into clang. If you compile with an extra flag, it will send inputs to the LLVMFuzzerTestOneInput function. This function recieves a buffer of data. It is our job to sanitize that data and split it up into a pattern and a string. The below implementation does the trick. It’s a bit too complicated to have you write it on your own, but if you want to understand it completely, please send a message to Adam.

int LLVMFuzzerTestOneInput(uint8_t *data, size_t size) {
    size_t intsize = size / sizeof(int8_t);
    uint8_t *intdata = (uint8_t *)data;
    if (intsize >= 2) {
        size_t len1 = intdata[0];
        if (size >= sizeof(uint8_t) + len1) {
            size_t len2 = size - sizeof(uint8_t) - len1;
            char *str = ( (char *)intdata ) + 1;
            char *str1 = calloc(len1 + 1, sizeof(char));
            char *str2 = calloc(len2 + 1, sizeof(char));
            strncpy(str1, str, len1);
            strncpy(str2, str + len1, len2);
            if (is_valid(str1, len1) && is_valid(str2, len2))  {
                printf("s1=%s, s2=%s\n", str1, str2);
                bool result = match(str1, str2);
                check(result, str1, str2);
                free(str1);
                free(str2);
                return result;
            }
            free(str1);
            free(str2);
        }
    }
    return 0;
}

Step 4: Editing the Makefile

Finally, add a rule to your Makefile which compiles the fuzzer.

fuzzer: fuzz.c match.c
    clang -g -fsanitize=address,undefined,fuzzer -fno-omit-frame-pointer $^ -o $@

Getting Checked Off

Lab \(n\) will be due during lab hours for lab \(n+1\). You will need to get manually checked off for most labs as we will not be using totally automated tests. To get checked off, join the office hours queue, and we will get to you as soon as possible.