Modules are the basic unit of encapsulation. They exist to hide away implementation details and present an interface that can be used without worrying about those implementation details.
This makes it easier to write code because it means that when you’re writing code that uses the module, you don’t need to think about how it’s implemented and when you’re writing the module, you don’t need to think about how it’s used.
It means if you come up with a better algorithm later, you can just switch it out, and fewer things to think about means you’re less likely to write bugs.
Another key aspect of encapsulation is that it allows you to ensure that “invariants” are maintained.
Encapsulation in Java
In Java, classes serve the role of both types and modules. Encapsulation is done by marking implementation details as private
and the presented interface as public
. Things outside the class can then only access public
things and not private ones.
Consider the class:
public class Foo {
public int[] sorted;
public void sort(int[] vals) {
// [sorts vals]
}
public Foo(int[] vals) {
sort(vals)
this.sorted = vals;
}
/**
* Returns the smallest stored integer
*/
public int smallest() {
return sorted[0];
}
public void print() {
System.out.println(foo.sorted);
}
}
We have a problem then, because consider this code:
public class Main {
public static void main(String[] args) {
Foo foo = new Foo({2, 5, 3, 4});
foo.sorted[3] = 1;
System.out.println(foo.smallest())
}
}
Then Main.main
would print 2
even though the smallest element in the array is now 1
!
In order for the smallest
method to be correct, it assumes that no matter what happens elsewhere in the code, sorted
is kept sorted. I.e., an invariant of this class is that sorted
is sorted.
In order to avoid the invariant being violated, we would instead declare sorted
as private:
class Foo {
private int[] sorted;
// ...
}
Additionally, while sort
is a useful helper function here, a general sort function has nothing to do with the interface of Foo
and we don’t want to worry about changing it or removing it because some other piece of code is relying on it. So we would mark that private
too.
class Foo {
// ...
private static void sort(int[] vals) {
// ...
}
// ...
}
So how do we do this in C?
Header files
In C, unlike in Java, types and modules are distinct. We can define multiple types in the same modules and we can have modules that define no types at all. For example, the string.h
library you can access by doing #include <string.h>
at the start of a C file defines no types, it’s just a module containing functions for working with char *
.
A module in C is a pair of files, module.h
(the “header file”) and module.c
(the “source file”). The header file contains “declarations” of all the functions and types (which may be opaque or concrete, more on that shortly) that the module publicly exposes or exports. The source files contains implementations of the functions and definitions of opaque types.
So how would the code above look in C?
Well, let’s start with writing our header file, foo.h
(while types and modules are distinct, it is common for them to share names if the module defines a single primary type and functions to work with it):
// foo.h
typedef struct foo foo_t;
foo_t *foo_init(int *bar, size_t len);
int foo_smallest(foo_t *foo);
void foo_print(foo_t *foo);
void foo_free(foo_t *foo);
Let’s note some important details:
- The line
typedef struct foo foo_t
declares the existence of typefoo_t
, but we don’t say what it contains. This is called an “opaque type” and is C’s equivalent of making fields private. Note that C doesn’t give us per-field granularity. Either all or none of the fields are public. -
foo_init
is our equivalent of constructor. C doesn’t have special constructor syntax so this is just a function which creates afoo_t
. It takes a length argument because unlike Java arrays, C pointers don’t store the length. - We’ve prefixed
smallest
andprint
withfoo
. This is because while C has modules, it does not have namespaces (we can’t do Foo.print). This means to avoid colliding with other modules, modules will prefix all of their functions and types with a common prefix. -
smallest
andprint
now have arguments: since C doesn’t have methods, so instead, we just pass ourstruct
s as the first argument of every function working on them in the module. - We’ve gained an additional function:
foo_free
. In Java, the garbage collector handles this part, but in C, we need to manually clean up memory, and sofoo_free
exists to cleanup everything thatfoo_init
does.
We declared foo_t
as an opaque type by stating that it exists with the typedef
but not specifying its fields. This effectively makes all the type’s fields “private” and means users of the module can just pass around copies of foo_t
without modifying them.
We could also have declared a concrete type point_t
, with something like this
typedef struct Point {
int x;
int y;
} point_t;
Here, it makes sense to declare point_t
as a concrete type because there’s no invariant to be maintained between x
and y
, so users freely changing these independently will not break any invariants—and making it concrete simplifies the code.
Let’s write the source file now:
// foo.c
#include <stdlib.h>
#include <stdio.h>
#include "foo.h"
struct foo {
int *sorted;
size_t len;
};
void sort(int *vals, size_t len) {
// [sort vals]
}
foo_t *foo_init(int *vals, size_t len) {
foo_t *new = malloc(sizeof(foo_t));
sort(vals, len);
new->sorted = vals;
new->len = len;
return new;
}
int foo_smallest(foo_t *foo) {
return foo->sorted[0];
}
void foo_print(foo_t *foo) {
printf("%d", foo->sorted[0]);
for (size_t i = 1; i < foo->len; i++) {
printf(", %d", foo->sorted[i]);
}
printf("\n");
}
void foo_free(foo_t *foo) {
free(foo->sorted);
free(foo);
}
Here we defined the type, whose fields are all private because the definition is in the source file and not the header file, and then implemented all the functions. Note that sort
doesn’t have the sort
prefix unlike everything else. This is because sort
is available only inside the module and so we’re not worried about namespace collisions.
Note that foo.c
has #include "foo.h"
at the top. Source files should always include their own headers.
So, how do you actually use a module?
So you’ve created foo.h
and foo.c
and now you want to actually use the module you’ve made. Let’s write a main.c
file which does. main
isn’t a module here, it’s the root of our application, which means it doesn’t need a header file, since nobody is going to be importing main
.
So we just include "foo.h"
like we include <stdio.h>
for printf
and such, right?
// main.c
#include <stdio.h>
#include <stdlib.h>
#include "foo.h"
int main() {
size_t len = 4;
int *vals = malloc(sizeof(int) * len);
vals[0] = 2;
vals[1] = 5;
vals[2] = 3;
vals[3] = 4;
foo_t *foo = foo_init(vals, len);
printf("%d\n", foo_smallest(foo));
foo_free(foo);
}
VSCode doesn’t make any red squiggles, so everything’s fine, right?
Let’s compile it. We run:
clang main.c -o main
Uh oh.
yshaluno@labradoodle:~/cs3/module_example$ clang main.c -o main
/usr/bin/ld: /tmp/main-0e13ca.o: in function `main':
main.c:(.text+0x58): undefined reference to `foo_init'
/usr/bin/ld: main.c:(.text+0x65): undefined reference to `foo_smallest'
/usr/bin/ld: main.c:(.text+0x84): undefined reference to `foo_free'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
What gives? Understanding exactly what this error means would go beyond the scope of this reading, however, the key thing to note is the linker command failed
line at the bottom. This tells you that this is not a normal compiler error like you’re used to from missing semicolons.
Instead, it basically means that the compiler was able to find the declarations of those functions (foo_init
, foo_smallest
, and foo_free
) but it wasn’t able to find their implementations. All #include
actually does is it pastes the included file. Before the compiler does anything complicated, a much simpler “preprocessor” runs and simply copy-pastes the included file where the #include
line is.
Thus, our main.c
is equivalent to
// main.c
#include <stdio.h>
#include <stdlib.h>
typedef struct foo foo_t;
foo_t *foo_init(int *bar, size_t len);
int foo_smallest(foo_t *foo);
void foo_print(foo_t *foo);
void foo_free(foo_t *foo);
int main() {
size_t len = 4;
int *vals = malloc(sizeof(int) * len);
vals[0] = 2;
vals[1] = 5;
vals[2] = 3;
vals[3] = 4;
foo_t *foo = foo_init(vals, len);
printf("%d\n", foo_smallest(foo));
foo_free(foo);
}
It makes sense why compiling this doesn’t work. We’ve declared all the functions we want, but we haven’t implemented them anywhere. We only told the compiler about main.c
and that doesn’t mention foo.c
anywhere. It turns out that fixing this is quite simple: we just need to tell the compiler about foo.c
.
yshaluno@labradoodle:~/cs3/module_example$ clang main.c foo.c -o do_thing
yshaluno@labradoodle:~/cs3/module_example$ ./do_thing
4
There are better ways to do this, but this is good enough for now. Our provided Makefile
s will handle the fancy things for you.
fatal error: 'foo.h' file not found
You may notice that our provided repositories organize things much more so that it’s easier to navigate.
In particular, they place all the header files in a directory called include/
. Let’s do that.
yshaluno@labradoodle:~/cs3/module_example$ mkdir include
yshaluno@labradoodle:~/cs3/module_example$ mv foo.h include
yshaluno@labradoodle:~/cs3/module_example$ clang main.c foo.c -o do_thing
main.c:3:10: fatal error: 'foo.h' file not found
#include "foo.h"
^~~~~~~
1 error generated.
Uh oh. What gives? Well, #include "foo.h"
tells the compiler to look for a file called foo.h
, but that doesn’t mean it magically knows what you mean. By default, the compiler will look in the working directory and then fall back to a list of locations it expects the header file to (this is things like /usr/include
—try running ls /usr/include
on Labradoodle and see what happens). If we put foo.h
at include/foo.h
, the compiler won’t be able to find it.
To fix this, we simply tell the compiler where to look for it with the -I[path]
flag:
yshaluno@labradoodle:~/cs3/module_example$ clang main.c foo.c -o do_thing -Iinclude
yshaluno@labradoodle:~/cs3/module_example$ ./do_thing
4
What’s the deal with #ifndef __FOO_H
If you’ve looked at the header files we’ve written, you’ll notice that they’re all wrapped in this pattern of:
#ifndef __MODULE_H
#define __MODULE_H
// actual contents go here
#endif
The exact formatting of the choice of identifier used in the place of __MODULE_H
(which is just an identifier) varies, but the purpose is to make sure it doesn’t collide with any other identifiers.
This pattern is called a “header guard” and they should be placed on all headers. Why do we have this? There’s a couple reasons. First, suppose foo.h
also defined a constant:
const int FOO_MAX = INT_MAX;
Then, say we have:
// bar.h
#include "foo.h"
int bar_bar(foo_t *foo);
// baz.h
#include "foo.h"
int baz_baz(foo_t *foo);
And then we have a main.c
which uses both of them:
// main.c
#include "bar.h"
#include "baz.h"
// ...
If you tried to compile this, you would get an error: redefinition of 'FOO_MAX'
. You’re only allowed to define constants once but we included the file twice, once through bar.h
and once through baz.h
.
Header guards solve this problem by ensuring that every header is included once. #ifndef
says “include the code inside only if the preprocessor constant called __FOO_H
is not defined” and the following line immediately defines it.
If we wrote:
//foo.h
#ifndef __FOO_H
#define __FOO_H
const int FOO_MAX = INT_MAX;
// ...
#endif
If we then include this twice, the first time the #ifndef
passes, since __FOO_H
hasn’t been defined yet. But the second time, __FOO_H
is already defined and so #ifndef
ensures the constant (and the rest of the header file) doesn’t appear a second time.
Header guards should still be used in cases where there are no constants defined because they also speed up compilation by telling the compiler that it doesn’t need to do this work again.