Nathan McMillan
Projects · Music · Photos · Resume

Reference Counting Memory Management

12 January 2022

Today we're going to take a look at how a scripting language can implement reference counting.

First we'll need to do is define our value types. To keep things simple, we'll have two primitives and two object types.

enum ValueType {
    VALUE_BOOL,
    VALUE_INTEGER,
    VALUE_ARRAY,
    VALUE_STRING,
};

We will use the combination of an enum plus a union in our struct so we can keep track of our value's type, and efficiently pack the data.

struct Value {
    enum ValueType is;
    union {
        bool b;
        int i;
        Object *o;
    } as;
};

Our struct holds a pointer to an object. Our objects will live on the heap. For this article, our Object will simply contain an integer `count` to keep track of how many references are holding it.

As long as our object types hold an object struct as the first field in their own struct, we can safely down cast to an Object.

struct Object {
    int count;
};
struct String {
    Object object;
    char *chars;
};
struct Array {
    Object object;
    Value *items;
    int length;
    int capacity;
};

Blah

void reference(Value value) {
    switch (value.is) {
    case VALUE_ARRAY:
    case VALUE_STRING:
        Object *object = (Object *)(value).as.o;
        object->count++;
    default:
        return;
    }
}

Blah

void dereference(Value value) {
    switch (value.is) {
    case VALUE_ARRAY: {
        Array *array = (Array *)(value).as.o;
        int count = array->object.count--;
        if (count == 0) {
            free(array->items);
            free(array);
        }
        return;
    }
    case VALUE_STRING: {
        String *string = (String *)(value).as.o;
        int count = string->object.count--;
        if (count == 0) {
            free(string->chars);
            free(string);
        }
        return;
    }
    default:
        return;
    }
}

Now any time there's something that involves looking at a value, we will need to reference and dereference appropriately.

For example, imagine a virtual machine with an `ADD` operation for two values. It might look like this

case OP_ADD: {
    Value b = pop(H);
    Value a = pop(H);
    if (is_int(a)) {
        if (is_int(b)) {
            a.as.i += b.as.i;
            push(H, a);
        }
    } else if (is_string(a)) {
        push_intern_string(H, value_concat(a, b));
    }
    dereference(H, a);
    dereference(H, b);
    break;
}

Typically, many references and dereferences will be due to pushing and popping from the stack. However, we'll need to keep in mind values that reference other values. For instance, our array object will need to push values to it.

case OP_ARRAY_PUSH: {
    break;
}

Something about optimizations

For a working implementation of this code, check out Hymn Script. A small intrepreted scripting language I wrote!

Reddit · Hacker News