Register for the lab using
grinch: https://grinch.caltech.edu/register and clone the repository as explained in the setup instructions.
Setup for Windows
This lab requires a UNIX-like environment which means, if you are on Windows, you need to set up and use WSL. Instructions on how to do this can be found here. You will want to compile, run, and test this lab from within WSL.
Once you have downloaded WSL, we will want to switch the terminal to run it from within VSCode. We can do this in two ways. First, in the terminal dropdown menu, you can select
Ubuntu, as shown
in the image below.
Alternatively, you can type
bash into your regular terminal, and you should see the WSL environment pop up as shown.
(In case you were wondering why we didn’t have you just do this for the projects, it doesn’t work with SDL…)
When desiging large programs (or even small ones!), it’s very important to have a short development cycle. The typical pattern people first use to write code is: (1) write all the code, (2) run the code on a large test case, (3) attempt to debug error \(n\), (4) cry, (5) go to 1.
While this is indeed a “cycle”, it’s not a particularly effective one. The biggest issues that arises is that the developer has implemented so many features at once that they are unable to isolate where the issue in their code actually is.
In this lab, we will force you to use an iterative development cycle (with a little bit of test-driven development thrown in) which looks more like
- Isolate what the individual features to implement are (what exactly a “feature” is is described below)
- Write a few tests for the \(n\)th feature to make sure you understand the problem
- Write feature \(n\)
- Run tests on code so far
- Debug feature \(n\)
- Increment \(n\) and go to 1
This version of a development cycle is centered around the idea of a “feature”. Usually, a large program can be split into modules and those modules can be further sub-divided into individual “features” (or “tasks” if you prefer) which can each be implemented semi-independently. To give you a sense of what this looks like, we will first describe a programming problem, then split it into features, then you will implement the features one by one using this strategy.
Background and Problem Definition
A regular expression (or regexp) is a pattern that can be used to match against text strings. The UNIX utility
grep can search through files for lines that contain the pattern.
CS 21 covers the mathematical theory behind regular expressions quite extensively, but we will only implement a small subset of the full language today. In particular, for our purposes, a regular expression will contain the following features:
- Letters (
Z) which match the literal letter they represent
- A dot (‘
.’) which matches any single character
- Two regular expressions contactenated together (‘
ab’) which represents first matching the first regex and then matching the second one.
- A letter or dot followed by a star (‘
a*’) which represents zero or more occurrences of that letter
(Notably, we are missing union among other things like character classes.)
More or less, this means if we have a string it matches a regular expression if all the letters match (where
* means the previous letter can be omitted or repeated).
The whole string must match the regexp!!! There cannot be any characters left over in a match!!!
The features above are basically the things to implement; we split it up a little more like follows:
Feature 1: Implement string literals (multiple literal characters in a row).
Feature 2: Implement dot (‘
Feature 3: Implement star when it follows a character literal (‘
Feature 4: Implement star when it follows a dot (‘
Part 1: Implement String Literals
In the file
test.sh, you will find a tester for your program with a single test and a comment that describes how to add more tests. The first thing you should do is write
at least five tests that exercise edge cases of feature 1. The tester uses the UNIX utility
grep (which is known to work) as a reference implementation to test your code.
Implement at least five tests for feature 1 in the
We’ve already seen that pointers allow us to do a lot of weird things. Up next: you can do arithmetic on pointers. (Whatttttttt?)
If we have a
char * pointer
p + 1 is a reference to the second character of the string. That is, the following equalities hold:
p == *p // We already knew this p == *(p + 1) // This one is new
In fact, in C, the brackets are literally syntactic sugar for pointer arithmetic and dereference.
You may find very simple (p + 1, p + 2, …) pointer arithmetic useful in implementing your regular expression matcher.
Implement feature 1 in the
match.c file as the
You may choose to implement it recursively or iteratively, but it is far easier to finish the later features if you write it recursively. We recommend that you consume a single character on each recursive call and use the empty string as your base case. Do not do anything fancy (DFAs are great, but we’re looking for a simple implementation here.)
After you implement feature 1, you should test feature 1 by running
make test. DO NOT continue until it is working.
Part 2: Implement Dot
Implement at least five tests for feature 2 (dot) in the
Implement feature 2 in the
match.c file as the
This should be a very small change from the previous iteration. It might be useful to extract matching a single character into its own function (that is, use procedural decomposition as you implement features).
After you implement feature 2, you should test feature 2. DO NOT continue until it is working.
Part 3: Implement Star for Character Literals
Implement at least five tests for feature 3 (star for character literals) in the
Implement feature 3 in the
match.c file as the
This is the hardest of all the features. In fact, if you’ve used good procedural decomposition, this is the last feature, because you’ll
get the fourth one for free. When thinking about the problem recursively, there are two cases if we are attempting to match
it matches the first character in the string or not. Think about how this can be turned into code.
After you implement feature 3, you should test feature 3. DO NOT continue until it is working.
Part 4: Implement Star for Dot
Implement at least five tests for feature 4 (star for dot) in the
Implement feature 4 in the
match.c file as the
match function if it isn’t already working on your tests.
Getting Checked Off
Lab \(n\) will be due during lab hours for lab \(n+1\). You will need to get manually checked off for most labs as we will not be using totally automated tests. To get checked off, join the office hours queue, and we will get to you as soon as possible.