CS 3 (Spring 2024) Ownership of Allocated Memory

Let’s begin by comparing equivalent Java and C code.

Java is a “memory safe” languages which manage your memory for you with a garbage collector. The garbage collector’s job is to keep track of which allocations are currently in use (e.g., objects created through new in Java). Once you’re no longer using an allocation, the garbage collector will automatically deallocate this memory for you, which allows it to be reused the next time something needs to be allocated.

public class Foo {
    private int bar;


    public Foo(int x) {
        bar = x;
    }
}

public class Thinger {
    public static void do_thing() {
        Foo foo = new Foo(3);
        System.out.println(foo.bar);
        // foo goes "out of scope"
    }
}

Since this is Java, the garbage collector automatically notices that foo isn’t being used any more at some point after do_thing finishes running and releases it.

C requires the programmer to do “manual memory management”. That is, it’s entirely the programmer’s job to make sure to release everything they allocate. One of the fundamental tenets of C programming is that for every malloc there is exactly one free (which is not true in the code below! Uh oh!).

typedef struct foo {
    int bar;
} foo_t;

foo_t *foo_init(int x) {
    foo_t *new = malloc(sizeof(foo_t));
    new->bar = x;
    return new;
}

void do_thing() {
    foo_t *foo = foo_init(3);
    printf("%d\n", foo->bar);
    // foo goes "out of scope"
}

Since this is C, this code would be considered incorrect as it has a “memory leak”. A “memory leak” is where a memory allocation becomes unreachable without being freed; this is problematic, because nobody can reuse this memory ever again during the current run of the program!

Delving a little deeper into the C example, you might have noticed that new is returned from foo_init without being freed. This is not an error! The whole point of this function is to do the allocation and “pass” it to the caller to be used later! You can think of this kind of function as a combination of the new keyword in Java followed by the foo constructor in Java.

Next, do_thing calls foo_init and, again, assigns it to a variable, foo. This is also good. The problem occurs when do_thing returns. Since, foo was the program’s only pointer to that memory allocation, when foo went out of scope, nobody has any pointer to that memory anymore and the memory is unreachable (or “leaked”).

The correct implementation of do_thing would be

void do_thing() {
    foo_t *foo = foo_init(3);
    printf("%d\n", foo->bar);
    free(foo);
}

So this memory management thing is simple enough, right? You just free pointers when you’re done with them and they’d go out of scope? Unfortunately, it’s not so simple.

Consider the following slight alteration of do_thing:

void do_another_thing() {
    foo_t *foo = foo_init(3);
    foo_t *foo2 = foo;
    printf(
        "%d = %d\n", 
        foo->bar, 
        foo2->bar
    );
    free(foo2);
    free(foo);
}

Recall that the line foo_t *foo2 = foo; does not allocate any new memory! Instead, it “copies the pointer”. This means that free(foo2) actually frees the same allocation as free(foo). This results in a “double free” error – because we’ve tried to free the same allocation twice. Not good!

A Solution: “Owning” Allocations

We solve this conceptual issue by denoting every pointer is either the “owner” or “borrower” of an allocation. We insist that, in our code, every allocation has exactly one owner which is responsible for making sure the data ultimately gets freed.

Consider the two following functions:

foo_print

void foo_print(foo_t *foo) {
    printf("%d\n", foo->bar);
}

Since foo_print’s only job is to read and print the data in foo, it shouldn’t take ownership of the data, but, rather just temporarily “borrow” it for use before returning it. To say it another way, after foo_print has run, the allocation foo points to should still be usable by the calling function.

foo_free

void foo_free(foo_t *foo) {
    free(foo);
}

Since foo_free calls free on the allocation, it should be clear that the pointer is invalid after foo_free returns. For foo_free to effectively do it’s job, the caller needs to “give up ownership” of the pointer (i.e., agree not to use it any more). If foo_free were only a borrower, then it wouldn’t be allowed to invalidate the pointer.

Putting everything together:

void bad_thing() {
    foo_t *foo = foo_init(3);
    foo_free(foo); 
    foo_print(foo);
}

This is bad. We’re freeing foo and then we’re using it, which asan would tell us is a use-after-free error and isn’t allowed.

Data Structures can own allocations too!

There’s one final caveat: data structures can own things.

Suppose we go back and modify the definition of foo_t so it instead has an array of integers:

typedef struct foo {
    int *bar;
} foo_t;

foo_t *foo_init(int x, size_t count) {
    foo_t *new = malloc(sizeof(foo_t));
    new->bar = malloc(sizeof(int) * count);
    for (size_t i = 0; i < count; i++) {
        new->bar[i] = x;
    }
    return new;
}

Now suppose we go back to do_thing():

void do_thing() {
    foo_t *foo = foo_init(3, 1);
    printf("%d\n", foo->bar[0]);
    free(foo);
}

We’re now leaking memory again! The array bar is never freed.

The instance of foo_t that foo points to itself owns the array foo->bar. When foo is freed, foo->bar “goes out of scope”, which means we have to free it then. Rather than remembering to do it by hand, we would have a foo_free function as in the example before, but now it’s modified:

void foo_free(foo_t *foo) {
    free(foo->bar);
    free(foo);
}

foo_free is an abstraction that lets users of foo not worry about the internals of what it owns and simply free the entire object, rather than just the top-level allocation, once they’re done, much like foo_init does for constructing foo.

This also applies to more complicated data structures, like our ll_map_t. Consider the ll_put function:

char *ll_put(ll_map_t *dict, char *key, char *value);

This function is inserting a key-value pair into the dictionary and potentially returning a previous value. After this function runs, key and value have been inserted into dict and dict owns key and value. This means that key and value have to be owning pointers.

Since dict is just modified but not freed or moved, dict is a borrowing pointer.

Finally, the return value is a previously-owned value (if it exists) that is being given back to the caller, so the return value is an owned string (which the caller must free).

After ll_put runs, dict owns key and value, which means that the caller is no longer allowed to touch those values and that dict is “responsible” for freeing them. What this means is that somewhere in ll_free, those values also need to be freed.

Recap

A standard model of memory management in C is the “ownership” model, which says that pointers come in 2 flavors:

All owning pointers ultimately come from a call to malloc or similar function and are freed right before they go out of scope.

You can freely produce borrowing pointers from owning pointers, but there must always be exactly one owning pointer for every heap allocation.

Ownership can, however, be freely transferred, including into and out of functions as well as into and out of data structures. When data structures go out of scope, all owned pointers within need to be freed, which is why you generally want a [type]_free() function to release an object and all its allocations.

When you transfer an owning pointer into a function or data structure, you can no longer use the original pointer because the moment you transfer ownership, the pointer could have been freed (or it could be like the key to a hashmap, where the implementation assumes it won’t change).

Footnote: string literals

We can write:

int main() {
    char *str = "Hello, world!";
    printf("%s\n", str);
}

The reason this is correct is that str is not an owning pointer, despite being the only pointer to the string. Owning pointers come from malloc or similar and "Hello, world!" doesn’t. Indeed, string literals are borrowing pointers (you can think of them as being owned by the program as a whole and released when the program ends). The function strdup (an implementation is given below) is sometimes very useful as a result:

char *strdup(const char *s1) {
    size_t len = strlen(s1)
    char *s2 = malloc(len + 1);
    memcpy(s2, s1, len + 1);
    return s2;
}

It takes a borrowed string and it produces an owned copy of it.

So keep in mind: string literals ("Hello, world!", "Hopper", "") are borrowed strings and remember that strdup exists if you need an owned string instead (e.g., ll_put(dict, strdup("Best animal"), strdup("Dogs"))).