Skip to content

EdinHg/CuckooHash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cuckoo Hashing Comparison

A comprehensive benchmark suite comparing Cuckoo Hashing, Chaining, and Linear Probing hash table implementations in C++17.

Features

  • Multiple Hash Table Implementations:

    • Cuckoo Hashing - O(1) worst-case lookup with configurable number of tables (2-5)
    • Linear Probing - Open addressing with linear collision resolution
    • Chaining - Separate chaining using linked lists
  • Benchmarking Capabilities:

    • Insertion time measurement
    • Positive/negative search performance
    • Probe count statistics (average and max)
    • Rehash threshold analysis
    • Load factor testing (0.50 to 0.90)
  • Dataset Support:

Requirements

  • C++17 compatible compiler (MSVC, GCC, Clang)
  • CMake 3.10 or higher
  • Python 3.x with pandas, matplotlib, numpy (for visualization)
  • Majestic Million dataset - download and place majestic_million.csv in the project root folder

Build

mkdir build
cd build
cmake ..
cmake --build . --config Release

Executables

The project builds three executables:

Executable Description
BasicTest Main benchmark comparing all three hash table implementations
ComprehensiveTest Extended benchmark with load factors 0.50-0.90, outputs CSV results
CuckooRehashTest Analyzes rehash thresholds for Cuckoo with 2-5 tables

Usage

Basic Test

./Release/BasicTest.exe

Comprehensive Test

./Release/ComprehensiveTest.exe

Outputs: benchmark_results.csv

Cuckoo Rehash Analysis

./Release/CuckooRehashTest.exe

Outputs: cuckoo_rehash_results.csv

Visualize Results

python plot_benchmark_results.py  # Visualize comprehensive benchmark results
python plot_rehash_results.py     # Visualize rehash threshold analysis

Outputs figures to: paper_figures/

Project Structure

├── CMakeLists.txt              # Build configuration
├── README.md
├── majestic_million.csv        # Real-world domain dataset
├── benchmark_results.csv       # Output from ComprehensiveTest
├── cuckoo_rehash_results.csv   # Output from CuckooRehashTest
├── plot_benchmark_results.py   # Python visualization for benchmarks
├── plot_rehash_results.py      # Python visualization for rehash analysis
├── paper_figures/              # Generated figures for research paper
├── build/                      # CMake build directory
├── include/
│   ├── HashTableBase.hpp       # Abstract base class with MurmurHash3
│   ├── CuckooHashTable.hpp     # Cuckoo hashing (2-5 tables + stash)
│   ├── LinearProbingHashTable.hpp
│   ├── ChainingHashTable.hpp
│   └── LinkedList.hpp          # Linked list for chaining
└── src/
    ├── basic_test.cpp          # Basic benchmark runner
    ├── comprehensive_test.cpp  # Extended benchmarks with CSV output
    └── cuckoo_rehash_test.cpp  # Rehash threshold analysis

Implementation Details

Cuckoo Hash Table

  • Configurable number of hash tables (default: 2)
  • Stash buffer for overflow handling
  • Automatic rehashing when insertion fails
  • Uses MurmurHash3 with different seeds per table

Hash Function

All implementations use MurmurHash3 (32-bit) for consistent hashing across string and integer keys.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •