CS 3 (Spring 2024) Solo 02: HTTP Requests and Servers, Oh my!

In this project, we will use the libraries from the last project to parse HTTP requests.

Setup

Register for the project using grinch: https://grinch.caltech.edu/register and clone the repository as explained in the setup instructions. You MUST work solo on this project! You may not collaborate at all with anyone else! Please use the SSH link found on the registration page and in the email sent to you, it should look something like this git@gitlab.caltech.edu:cs3-24sp/solo02-blank.git.

Introduction

When you open a URL in your browser, what exactly is happening? We will answer this question over the next few solo projects as we attempt to “respond” to a browser’s request for a website.

URLs consist of three parts: (1) the protocol, (2) the remote server, and (3) the path. For example, the website you requested to read this document, https://sof.tware.design/24sp/projects/solo/02, uses the “https” protocol, the “sof.tware.design” remote server, and the path “24sp/projects/solo/02”.

While, nowadays, you will mostly see the “https” protocol, there’s others such as “http”, “file”, and “mailto”. The only difference between “https” and “http” is that “https” is secured using a certificate which your browser checks.

In practice, HTTP has been more or less phased out in favor of HTTPS for its security, but in this project we’ll be working with HTTP because it turns out that security makes things much more complicated.

After you complete this project, you’ll be able to run make server to start your own web server and connect to it from your own browser.

Hypertext Transfer Protocol (“HTTP”)

After you type a URL into your browser and hit enter, your browser constructs a “request” to the remote server specified in the URL.

HTTP requests are strings consisting of five parts:

What’s in a Web Server?

On the other side (the one being “requested from”) is a “web server”. A “web server” is a fancy name for a computer that is waiting (in a loop) for other computers to make requests using the HTTP(S) protocol. Since we know the format of an HTTP request, this more or less boils down to waiting and processing connections, one at a time. The general shell of a web server looks like the following:

Writing the Web Server

First things first, copy over your code from last solo project.

Next, let’s start sketching out the web server itself. It’s not going to do much yet, but you’ll be able to build on it. At any point, you can run make server to run your server.

There aren’t any tests for the server code until the end of this project, but you should test it yourself by connecting to it via the link printed by make server.

Before you implement the parsing, we need to do some bookkeeping utilities.

The Format of an HTTP Request

An HTTP request will look like

[METHOD] [PATH] [VERSION STRING]\r\n
[KEY 1]: [VALUE 1]\r\n
[KEY 2]: [VALUE 2]\r\n
...
[KEY n]: [VALUE n]\r\n
\r\n

where each of [METHOD], [PATH], and [VERSION STRING] is guaranteed not to contain any spaces, \n, or \r, and each [KEY #] is guaranteed not to contain any :’s, \n, or \r. Each [VALUE #] is guaranteed not to contain any \n or \r (but may contain :’s).

The method, path, and version are space-separated and each line ends in \r\n. The key-value pairs are each on a separate line with the key and the value separated by “: ” (a colon followed by a space). The request header is terminated by a \r\n. The request may also include a body after the final \r\n, but you will not be parsing that in this project and may assume it doesn’t exist.

Parsing the HTTP Request

Now it’s time to implement the parsing.

At this point, your server is capable of parsing requests! Unfortunately, the server is not able to respond to them yet. We’ve written most the function response_format for you, but you need to finish it.

Responding to the HTTP Request

A response is formatted as follows:

[VERSION STRING] [RESPONSE CODE] [RESPONSE BRIEF]\r\n
[KEY 1]: [VALUE 1]\r\n
...
[KEY n]: [VALUE n]\r\n
\r\n
[BODY]

You’re only going to be supporting [VERSION STRING] = HTTP/1.1 and a single key-value header pair, Content-Type: text/html.

One of the primary components of the response is the “response status code” which represents a brief summary to the computer of the status of the response. This is things like “200” for “everything ok” or the ever-familiar “404” for “not found.”

The [response brief] is a short human-readable summary of the status code. For 200, this is "OK", or for 404 this is "Not Found".

See the response_format documentation for details on what status codes you should support or see Mozilla’s documentation if you’re curious about other status codes.

If you look at include/http_response.h, you will see an “enum” of status codes. Before you continue, you’ll want to familiarize yourself with enums by reading our explanation of enums and switch statements.

Now, go ahead and look at response_format. Since this function is mostly nothing new, we’ve done most of the work for you. However, you do need to fill out the status_brief function. You’ll want to use a switch statement.

Run make test to make sure you’ve finished everything.

Push your code to GitLab to finish the project.