Objects

Object-Oriented Programming (in short “OOP”) is a programming paradigm in which data is tied to the code that is supposed to process it; an object (i.e. a structure containing some data) comes with its own handling functions (the methods).

Many programming languages are said to be “object-oriented” in that they offer syntaxic and semantic constructions that help in using OOP. Prime examples of such languages would include Java, Smalltalk, C#, Javascript, C++, OCaml, Python, and many (many) others. However, a few languages are not object-oriented in that sense, and C falls in that category. This does not mean that OOP is impossible in C, only that the language will not help much.

BearSSL makes use of OOP in order to provide a modular internal structure in which implementations can be switched at runtime. This has the double benefits of allowing specialised implementations of various algorithms for better performance on some architectures, and to avoid “pulling in” unused code. For instance, when a hash function is used in BearSSL, this is done through an object which is provided by the caller; if using a SSL server with a specialised “SHA-256 only” profile, then the implementations for MD5, SHA-1 and SHA-512 will not be included in the resulting binary.

Hash Functions

Hash function implemented in BearSSL are a good example of how OOP is implemented and used.

Classical API

Let’s take a hash function like SHA-1. Internally, data is processed by blocks of 64 bytes, and we want to be able to process data chunk by chunk, without having the whole of it in RAM at any time; thus, a context structure will be needed, to keep the current partial block, and the result of processing all previous blocks. In BearSSL, this context structure is called br_sha1_context and is defined in bearssl_hash.h (calling code just includes <bearssl.h>). The structure contents are nominally opaque (application code should not access fields directly) but are still defined in the header file so that the caller may allocate the context where appropriate (statically, in the heap, on the stack…).

To set the context contents for a new SHA-1 computation, use br_sha1_init():

br_sha1_context sc;

br_sha1_init(&sc);

Then input data chunk by chunk:

br_sha1_update(&sc, data1, len1);
br_sha1_update(&sc, data2, len2);

where each chunk consists in some bytes (possibly zero). The second parameter to br_sha1_update() has type const void *, so it can be about anything in memory (including static, constant data), but in practice the data will often be an array of unsigned char.

When the output of the function is needed (20 bytes, since this is SHA-1), the br_sha1_out() function can be used:

unsigned char tmp[20];

br_sha1_out(&sc, tmp);

That function is declared with the following prototype:

void br_sha1_out(const br_sha1_context *ctx, void *out);

In particular, the used context is declared const, which means that it won’t be modified by br_sha1_out(): the function computes the SHA-1 output corresponding to all bytes injected since the last call to br_sha1_init() on that context, handling all the details of SHA-1 processing termination (padding and so on), but without actually terminating the context. This is handy to compute “partial SHA-1 hashes”; in particular, in a SSL/TLS handshake, we need to compute the hash of all preceding handshake messages at several points of the handshake, thereby requiring such partial hashes.

The context is allocated by the caller, and disposed of by the caller, in any way as it sees fit. The implementation is fully reentrant and thread-safe, in that no data is altered outside of the context structure, so several threads or signal handlers can use the implementation concurrently with no need for any synchronisation, as long as they work over distinct context structures (of course, if two threads try to inject bytes in the same context structure, then some sort of synchronisation will be necessary).

Two other functions (br_sha1_state() and br_sha1_set_state()) are defined; they give access to the internal hash function state and are used to support constant-time HMAC verification (in relation with the padding used in SSL/TLS records).

Object API

If we look at the (nominally opaque) contents of the br_sha1_context structure, we find the following:

typedef struct {
        const br_hash_class *vtable;
        unsigned char buf[64];
        uint64_t count;
        uint32_t val[5];
} br_sha1_context;

The buf[] array stores the data for the current block. The count field contains the number of bytes injected since the last call to br_sha1_init(). The val[] array contains the running state of SHA-1, which consists in five 32-bit words.

The first field of the structure is called vtable, and it incarnates the OOP API. It is defined as a pointer to a (constant) br_hash_class struture. That structure is defined as follows:

typedef struct br_hash_class_ br_hash_class;
struct br_hash_class_ {
        size_t context_size;
        uint32_t desc;
        void (*init)(const br_hash_class **ctx);
        void (*update)(const br_hash_class **ctx, const void *data, size_t len);
        void (*out)(const br_hash_class *const *ctx, void *dst);
        uint64_t (*state)(const br_hash_class *const *ctx, void *dst);
        void (*set_state)(const br_hash_class **ctx,
                const void *stb, uint64_t count);
};

A br_hash_class structure represents an implementation of a hash function, with function pointers for the various operations that mimic the classical API (init(), update(), out()…). The various fields are the following:

context_size: the size (in bytes) of the context structure for this implementation. This size will be used by the caller, if the caller wishes to use dynamic allocation.
desc: a 32-bit field that encodes some information on the hash function (symbolic identifier, size of output, size of state, size of internal blocks, and Merkle-Damgård padding type).
init(): the initialisation function, to reset the context contents.
update(): the update function, to inject some more bytes.
out(): the out function, that finalises the current computation and computes the hash output. There again, the context is not modified, so partial hashes are easily computed.
state() and set_state() correspond to the access to the internal hash state.

The vtable field is called by analogy with a classical OOP implementation techniques called a Virtual Method Table. A very important point here is that the vtable field is the first field of the context structure; this uses the fact that in C, a pointer to a structure can be used as pointer to its first field, and vice versa, provided that suitable type casts are employed (the C standard guarantees that).

Let’s suppose that we want to write a function that hashes a data element but is generic, i.e. it works with any hash function. Such a function, in the BearSSL API, would look like this:

/*
 * Compute the hash of 'data' ('len' bytes) with the hash
 * function 'hf'. Output is written in the buffer pointed to
 * by 'dst'; the hash output length (in bytes) is returned.
 */
size_t
hash_data(const br_hash_class *hf,
        const void *data, size_t len, void *dst)
{
        br_hash_compat_context sc;

        hf->init(&sc.vtable);
        hf->update(&sc.vtable, data, len);
        hf->out(&sc.vtable, dst);
        return (hf->desc >> BR_HASHDESC_OUT_OFF) & BR_HASHDESC_OUT_MASK;
}

Let’s analyse what happens here. The br_hash_compat_context type is defined by BearSSL as:

typedef union {
        const br_hash_class *vtable;
        br_md5_context md5;
        br_sha1_context sha1;
        br_sha224_context sha224;
        br_sha256_context sha256;
        br_sha384_context sha384;
        br_sha512_context sha512;
} br_hash_compat_context;

so it really is a structure which is large enough to accommodate a context structure for all implemented hash functions (not simultaneously, of course). The first field of every one of the context structures (br_md5_context, br_sha1_context,…) is a vtable pointer, and one of the types of the union is itself a vtable pointer (called, appropriately, vtable), so when the caller accesses sc.vtable, it is really accessing, by the magic of the union, the vtable pointer of whatever hash function context structure is actually used.

The hf->init() call initialises the structure as is appropriate for the hash function incarnated by hf. Then the hf->update() and hf->out() calls run the hash function. Finally, the hf->desc field is used to obtain the hash function output size.

The hf->init() method, apart from initialising the hash function state, also sets the first field (vtable) of its context, so the code above could also be written like this:

size_t
hash_data(const br_hash_class *hf,
        const void *data, size_t len, void *dst)
{
        br_hash_compat_context sc;

        hf->init(&sc.vtable);
        sc.vtable->update(&sc.vtable, data, len);
        sc.vtable->out(&sc.vtable, dst);
        return (hf->desc >> BR_HASHDESC_OUT_OFF) & BR_HASHDESC_OUT_MASK;
}

This can be convenient if context initialisation and usage are done in separate functions: no need to transport the pointer to the vtable, it is already there in the context. This feature leverages the fact that all hash context structures store their respective vtable field at the same place (at the very start of the structure); it thus can be obtained with the vtable field from the union.

Context Allocation

The br_hash_compat_context union is needed to allow our function to allocate on its stack a structure of the right size. Alternatively, it could have used malloc() to obtain the context from the heap, as shown below:

size_t
hash_data(const br_hash_class *hf,
        const void *data, size_t len, void *dst)
{
        void *sc;

        sc = malloc(hf->context_size);
        if (sc == NULL) {
                /* Do something "intelligent" about the error. */
                fprintf(stderr, "Aaaaaaaargh!\n");
                exit(EXIT_FAILURE);
        }
        hf->init(sc);
        hf->update(sc, data, len);
        hf->out(sc, dst);
        free(sc);
        return (hf->desc >> BR_HASHDESC_OUT_OFF) & BR_HASHDESC_OUT_MASK;
}

Another method would use a “variable-length array”, which is a C99 feature, thus not necessarily available on the compiler you have to use for your target system:

size_t
hash_data(const br_hash_class *hf,
        const void *data, size_t len, void *dst)
{
        unsigned char sc[hf->context_size];

        hf->init((void *)sc);
        hf->update((void *)sc, data, len);
        hf->out((void *)sc, dst);
        return (hf->desc >> BR_HASHDESC_OUT_OFF) & BR_HASHDESC_OUT_MASK;
}

The method with malloc() works on every standard compiler (since the first “ANSI C” in 1989) but it requires malloc(), which is not necessarily available in bare embedded systems, and it raises issues about what to do on allocation failure. The method with the variable-length array looks better but needs a C99-aware compiler¹; also, whether it is actually better than malloc() is questionable: the variable-length array is still dynamic allocation, so the question of allocation failure is not solved, only ignored. Indeed, there is no programmatic way to react to an allocation failure at that level. Moreover, the stack size is typically more constrained than the heap size.

Conceptually, since the function must allocate a context structure, there is necessarily a backward compatibility issue if a newer hash function is implemented and requires a bigger context. The union at least makes that problem explicit, and allows the compiler to diagnose an oversized stack allocation at compile-time.

OOP Concepts

The three main concepts of OOP are encapsulation, inheritance and polymorphism. At least so goes the classical OOP theory; as with any other theory in computer science, that terminology is still debated, sometimes hotly.

Encapsulation

Encapsulation is about hiding data within the object. An object is data that comes with the methods used to process it; encapsulation is about using only these methods.

In an OOP language like Java, encapsulation can be enforced: make your fields private and other classes won’t be able to read or write them without going through the defined methods². In C, since there is no language-supported OOP construction, encapsulation is based on discipline: the application developer refrains from accessing fields directly.

For most fields that are to be accessed, BearSSL tends to define accessor functions, which are defined directly in the header files as static inline: they thus have function-like semantics (i.e. less troublesome to use than macros) while being still optimised away by any decent C compiler.

Polymorphism

Polymorphism is the ability to have several objects, with different types, to be handled similarly. We saw that with hash functions: an initialised context can be used generically, because:

All hash functions put the pointer to the vtable in the first field of the context structure.
All vtables for the various hash function implementations have compatible layout; for instance, the update() method pointer is always located at the same emplacement relatively to the start of the vtable.

We also saw that when it comes to context allocation, polymorphism is not complete: the caller must mind the size. This is a consequence of our insistance on not using dynamic allocation.

Inheritance

Inheritance allows a type of be defined over an other type, as a variant thereof. This is not used in hash functions, but an example can be found in the SSL record handling objects: the br_sslrec_in_cbc_class type is defined as a structure whose first field is a br_sslrec_in_class. This allows br_sslrec_in_cbc_class to define its own extra method (init()) while ensuring that object instances using such a vtable can still be used by code that expects an instance of br_sslrec_in_class.

This inheritance mechanism again uses the guarantees offered by the C standard, namely interchangeability of a structure with its first field.

Older compilers might use alloca(), but its availability is not guaranteed and it is not “standard C”, only “traditional Unix C”.↩
Unless they cheat.↩