CS 3 (Spring 2022) Lab 04: Iterative Development

This lab will focus on the process of developing programs with many features.

Setup

Register for the lab using grinch: https://grinch.caltech.edu/register and clone the repository as explained in the setup instructions.

Iterative Development

When desiging large programs (or even small ones!), it’s very important to have a short development cycle. The typical pattern people first use to write code is: (1) write all the code, (2) run the code on a large test case, (3) attempt to debug error \(n\), (4) cry, (5) go to 1.

While this is indeed a “cycle”, it’s not a particularly effective one. The biggest issues that arises is that the developer has implemented so many features at once that they are unable to isolate where the issue in their code actually is.

In this lab, we will force you to use an iterative development cycle (with a little bit of test-driven development thrown in) which looks more like

  1. Isolate what the individual features to implement are (what exactly a “feature” is is described below)
  2. Write a few tests for the \(n\)th feature to make sure you understand the problem
  3. Write feature \(n\)
  4. Run tests on code so far
  5. Debug feature \(n\)
  6. Increment \(n\) and go to 1

This version of a development cycle is centered around the idea of a “feature”. Usually, a large program can be split into modules and those modules can be further sub-divided into individual “features” (or “tasks” if you prefer) which can each be implemented semi-independently. To give you a sense of what this looks like, we will first describe a programming problem, then split it into features, then you will implement the features one by one using this strategy.

Background and Problem Definition

A regular expression (or regexp) is a pattern that can be used to match against text strings. The UNIX utility grep can search through files for lines that contain the pattern. CS 21 covers the mathematical theory behind regular expressions quite extensively, but we will only implement a small subset of the full language today. In particular, for our purposes, a regular expression will contain the following features:

(Notably, we are missing union among other things like character classes.)

More or less, this means if we have a string it matches a regular expression if all the letters match (where * means the previous letter can be omitted or repeated).

The whole string must match the regexp!!! There cannot be any characters left over in a match!!!

The features above are basically the things to implement; we split it up a little more like follows:

Feature 1: Implement string literals (multiple literal characters in a row).

Feature 2: Implement dot (‘.’).

Feature 3: Implement star when it follows a character literal (‘a*’)

Feature 4: Implement star when it follows a dot (‘.*’)

Part 1: Implement String Literals

In the file test.sh, you will find a tester for your program with a single test and a comment that describes how to add more tests. The first thing you should do is write at least five tests that exercise edge cases of feature 1. The tester uses the UNIX utility grep (which is known to work) as a reference implementation to test your code.

After you implement feature 1, you should test feature 1 by running make test. DO NOT continue until it is working.

Part 2: Implement Dot

After you implement feature 2, you should test feature 2. DO NOT continue until it is working.

Part 3: Implement Star for Character Literals

After you implement feature 3, you should test feature 3. DO NOT continue until it is working.

Part 4: Implement Star for Dot

Part 5: Fuzzing your Regexp Engine

A “fuzzer” is an automated tool that tries random inputs on your program in an attempt to make it fail. These inputs tend to start out completely random, but they narrow down as the fuzzer determines features of your code. For example, in this lab, your fuzzer will quickly determine that * is an interesting character and attempt to insert stars in many places in the random string.

In this section, we will be using clang’s built-in fuzzer to attempt to find strings that make a reference implementation of grep output different results than your implementation of match. Additionally, it will try to find inputs that cause asan to fail. This is very powerful, because it is extremely unlikely that the fuzzer will miss an entire case of outputs.

Run make fuzzer to compile the fuzzer to a binary called fuzzer.

Getting Checked Off

Labs will be due the following Tuesday from when they are assigned, at 11:30 PST. You will need to get manually checked off for most labs, as we will not be using totally automated tests. To get checked off, make a ticket saying you are done, and we will check you off as soon as possible. Please include the link to your repo in the ticket.