A collection of different implementations of the hash table data structure. They mostly differ by their strategies of collection resolution. Some implementations are written in Python well others in C.
Files: chaining.c & chaining.py
Each bucket is comprised of a linked list. When a collision occurs, a new node is added to the end of the linked list chain.
The entire chain must be searched for most operations, but the length of each chain is expected to be short, therefor has an amortized time of
File: perfectOrdered_hashTable.c
Whenever there is a collision, all items are re-hashed with a separate seed.
Repeat until there are no more collisions.
If it fails to find a seed without a collision after ~100 attempts, the table’s size is double.
Guaranteed worst-case
NOTE: Not working properly. Gets stuck in an infinite loop with modest input sizes
File: cuckoo_hashing.py
A form of perfect hashing where each item has two potential tables they can occupy. During collisions, the new item evicts the current occupant to its alternate position in the other table. This process is repeated until an empty bucket is found, or if too many evictions occur. The table is doubled in size in the case of the latter.
This ensures that each item has only two potential locations, so guarantees a
File: RobinHood.py
A form of open addressing where the distance to their original bucket is minimized on average by "taking from the rich".
Works by having all items store their offset to the bucket they were originally hashed to. When a new item is inserted but there is a collision, it keeps searching adjacent buckets until either an open bucket is found, or another item has a lesser offset. The item to add is exchanged with the lesser offset one, and the process is repeated. This minimizes the average distance from the original bucket, allowing for relatively fast lookup, deletion, and insertion, even with a very high load factor.
Files: hashFunction_compare.c & hashFunction_compare_OLD.py
Measures how well distributed different hash functions are.
File: Bloomfilter.py
A hash table-like probabilistic data structure that uses a bit array to filter out all true positives. Works by deliberately allowing collisions but assigning multiple buckets/bits per item. This means that for a false positive to occur, all the buckets that one item is hashed to must already be occupied.
A lot more memory efficient than a hash table with similar performance, because the key to an item doesn’t have to be stored and only a single bit is needed to represent an occupied bucket.
File: CntBloomFilter.c
Has the ability count the number of occurrences of a each item. This specific implementation will dynamically re-size the structure to prevent buckets from overflowing. Uses a larger, intermediate type to store bucket values when performing comparisons.