Basic data structures, algorithms and utilities for my personal projects.
Increase productivity by vertically integrating more of the programming stack, sacrificing interoperability and ease of collaboration (which are not needed in personal projects). General guidelines:
- Bounds checking must be enabled by default, without syntax overhead.
- Everything is public, nothing is const.
- Data should be bytes — don't imbue data with magic (i.e. no constructors or destructors).
- Prefer simple implementations over general ones.
- No exceptions. Errors are handled by return code, accumulated (local) error statuses, or panics.
- Prefer offsets to pointers.
This code is public to serve as a reference and to enable other people to compile my other projects that use it. If you are reading this and considering using this library in one of your own projects, please reconsider. The code is optimised for my own purposes only. While core components have been extensively tested, bugs do appear. A lot of the value I get from this library is that have written (almost) all the code myself. Documentation is sparse.
90% of data structures are arrays. The other 10% are hashmaps, which are also arrays.
The guiding principle that always holds is: There is no magic. Things do not happen if you do not ask for them to happen.
The most important type is Array<T>. It is an array. It consists of exactly two things: a pointer to the first element, and a size. You access it using operator[], which is bounds-checked. The end.
Importantly, there is no magic. There is no constructor that allocates and there is no destructor that deallocates. The array does not “own” anything. You allocate memory yourself, e.g. by calling array_create. You deallocate memory yourself, by calling array_free. (But you should do either sparingly — whenever you write a manual allocation, you risk writing a bug.) Copying the array does not copy the elements, it just copies the pointer and the size. (You can copy it using array_copy, but that is also rarely a good idea.)
One function that you do want to use a lot is array_subarray. It gives you what is often referred to as a slice, i.e. an array to a contiguous subset of elements of the original array.
If you need a string, use Array<u8> (u8 is an unsigned 8-bit integer). You can write ""_arr to get an Array<u8> literal (e.g. Array<u8> arr = "Hello, World!"_arr).
The second most important type is Array_dyn<T>. This is a dynamic array: it can grow (and shrink, though you rarely need that). It has an Array and a capacity. This is the main form of memory management:
Array_dyn<s64> arr;
array_push(&arr, 1);
// congratulations, you have now allocated memory!Ideally, you have some fixed number of arrays that live thoughout your entire program, so you never free them. That ideal is not always sensible, but you generally want to have fewer dynamic arrays, and you want them to live longer.
For example, instead of using an Array_dyn<Array_dyn<u8>>, which should fill you with dread at the level of indirection and complexity, consider using one Array_dyn<u8> for all the data, and one Array_dyn<s64> for the indices. For example:
Array_dyn<s64> dirs_indices;
Array_dyn<u8> dirs_data;
array_append(dirs_data, "northwestsoutheast"_arr);
array_append(dirs_indices, {0, 5, 9, 14, 18});To get south, you can then call array_subindex(dirs_indices, dirs_data, 2). This is a common pattern (and it has its disadvantages, as well).
Finally, a note on freeing memory. Sometimes (often, even), it is convenient to allocate memory just for the duration of a function. (While this can always be avoided, strict adherence to dogma is not optimal.) If you had a std::vector, it would magically free all the memory when it goes out of scope. For the reasons argued above, Array_dyn does not do that. But if you want that behaviour, you just need to ask for it:
void do_something() {
Array_dyn<u8> temp;
defer { array_free(&temp); };
...
// temp will be freed once the function returns
}(You can do the same with Array.)
Let's start with an example: computing the maximum number of occurrences of any element in the array.
s64 max_occurences(Array<u32> arr) {
Hashmap<s64> count;
defer { hashmap_free(&count); };
count.empty = (u64)1 << 32;
s64 max = 0;
for (u32 i: arr) {
s64* ptr = hashmap_getcreate(&count, i, 0);
*ptr += 1;
if (max < *ptr) max = *ptr;
}
}The hashmap maps 64-bit integers to some arbitrary type. (Why only 64-bit integers as keys? It makes the hashmap simpler, and as it turns out, I have never needed more.)
You need to set a sentinel value to mark empty slots; this should be some value that you know you are not going to insert. (Common choices would be 0, -1 or (u64)1<<63.)
The common operations exist: hashmap_get to query a value, hashmap_set to set a key to a value. But usually you can get away with only one call to a more specialised operation: hashmap_setnew is like set, but asserts that the element does not exist yet. hashmap_getptr queries a key; if it exists, it returns a pointer to the value, otherwise nullptr. hashmap_getcreate initialises the value if it does not exist, and then returns a pointer to it. A common pattern:
Hashmap<s64> cache;
void get_cached_result(u64 param) {
s64* ptr = hashmap_getcreate(&cache, param, -1);
if (*ptr == -1) *ptr = expensive_operation(param);
return *ptr;
}