A comprehensive benchmark suite comparing Cuckoo Hashing, Chaining, and Linear Probing hash table implementations in C++17.
-
Multiple Hash Table Implementations:
- Cuckoo Hashing - O(1) worst-case lookup with configurable number of tables (2-5)
- Linear Probing - Open addressing with linear collision resolution
- Chaining - Separate chaining using linked lists
-
Benchmarking Capabilities:
- Insertion time measurement
- Positive/negative search performance
- Probe count statistics (average and max)
- Rehash threshold analysis
- Load factor testing (0.50 to 0.90)
-
Dataset Support:
- Real-world data via Majestic Million CSV
- Synthetic URL generation for testing
- C++17 compatible compiler (MSVC, GCC, Clang)
- CMake 3.10 or higher
- Python 3.x with pandas, matplotlib, numpy (for visualization)
- Majestic Million dataset - download and place
majestic_million.csvin the project root folder
mkdir build
cd build
cmake ..
cmake --build . --config ReleaseThe project builds three executables:
| Executable | Description |
|---|---|
BasicTest |
Main benchmark comparing all three hash table implementations |
ComprehensiveTest |
Extended benchmark with load factors 0.50-0.90, outputs CSV results |
CuckooRehashTest |
Analyzes rehash thresholds for Cuckoo with 2-5 tables |
./Release/BasicTest.exe./Release/ComprehensiveTest.exeOutputs: benchmark_results.csv
./Release/CuckooRehashTest.exeOutputs: cuckoo_rehash_results.csv
python plot_benchmark_results.py # Visualize comprehensive benchmark results
python plot_rehash_results.py # Visualize rehash threshold analysisOutputs figures to: paper_figures/
├── CMakeLists.txt # Build configuration
├── README.md
├── majestic_million.csv # Real-world domain dataset
├── benchmark_results.csv # Output from ComprehensiveTest
├── cuckoo_rehash_results.csv # Output from CuckooRehashTest
├── plot_benchmark_results.py # Python visualization for benchmarks
├── plot_rehash_results.py # Python visualization for rehash analysis
├── paper_figures/ # Generated figures for research paper
├── build/ # CMake build directory
├── include/
│ ├── HashTableBase.hpp # Abstract base class with MurmurHash3
│ ├── CuckooHashTable.hpp # Cuckoo hashing (2-5 tables + stash)
│ ├── LinearProbingHashTable.hpp
│ ├── ChainingHashTable.hpp
│ └── LinkedList.hpp # Linked list for chaining
└── src/
├── basic_test.cpp # Basic benchmark runner
├── comprehensive_test.cpp # Extended benchmarks with CSV output
└── cuckoo_rehash_test.cpp # Rehash threshold analysis
- Configurable number of hash tables (default: 2)
- Stash buffer for overflow handling
- Automatic rehashing when insertion fails
- Uses MurmurHash3 with different seeds per table
All implementations use MurmurHash3 (32-bit) for consistent hashing across string and integer keys.
MIT