Skip to content

High-performance C++ implementations of concurrent skip lists (sequential, coarse-grained, fine-grained, and lock-free) featuring a comprehensive benchmarking suite and performance visualization tools.

Notifications You must be signed in to change notification settings

tobjec/concurrent-skiplists

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Concurrent Skip Lists in C++

University Project at TU Wien | Advanced Multiprocessor Programming (184.726)

This project focuses on the implementation, benchmarking, and performance analysis of concurrent skip lists using C++ with OpenMP and Pthreads. The objectives are to develop a sequential skip list as a baseline and explore three concurrent implementations: coarse-grained, fine-grained, and lock-free.

Team Members

  • Florian Frech
  • Tobias Jechtl
  • Lorenz Moser

Implementations

We explore four distinct approaches to managing state in a probabilistic data structure:

  1. Sequential (sequential)

    • A standard, non-thread-safe Skip List.
    • Serves as the baseline for evaluating the performance of the concurrent versions.
    • Complexity: Expected O(log n) time complexity for fundamental operations.
  2. Coarse-Grained Locking (coarse_grained)

    • Uses a single global OpenMP lock to serialize all access.
    • Ensures thread safety by protecting all operations with a single global lock.
    • Limits scalability under high contention due to serialization.
  3. Fine-Grained Locking (fine_grained)

    • Implements "Lazy Synchronization" using hand-over-hand locking with pthread_mutex.
    • Locks individual nodes and their relationships to reduce contention and improve performance.
    • Uses a two-step deletion: Logical marking followed by physical unlinking.
  4. Lock-Free (lock_free)

    • Based on Herlihy & Shavit's design using Atomic Markable References.
    • Leverages atomic compare-and-set operations to achieve synchronization without locks.
    • Guarantees correctness even in the presence of concurrent modifications.

Project Structure

.
├── assets/                 # Evaluation plots
├── incl/                   # Header files (.hpp)
├── src/                    # Source implementation (.cpp)
├── tests/                  # Integration tests for correctness
├── scripts/                # Python scripts for plotting
├── Makefile                # Build automation
└── README.md               # Documentation

Build & Usage

Prerequisites

  • Compiler: g++ (supports C++17)
  • Libraries: OpenMP, pthread
  • Build Tool: Make

Compilation

To build the main executable:

make all

To build and run the correctness tests:

make tests

Running Benchmarks

You can run specific pre-configured benchmark suites using the Makefile:

make bench-sequential
make bench-coarse_grained
make bench-fine_grained
make bench-lock_free

Custom Execution (CLI)

You can run custom scenarios using the generated build/skiplist executable.

Example: Run a Lock-Free benchmark with 4 threads, a 50/50 insert/delete mix, for 5 seconds:

./build/skiplist --kind lock_free --threads 4 --time-interval 5 --mix 0.5,0.5,0.0 --output-print 1

Parameters:

Flag Description Default
--kind Implementation (sequential, coarse_grained, fine_grained, lock_free) sequential
--threads Number of concurrent threads 1
--mix Operation mix: insert,remove,contains 0.1,0.1,0.8
--key-range Range of keys (e.g., 0,100000) 0,100000
--key-range-type Key distribution: common, disjoint, or per-thread common
--time-interval Duration of the benchmark in seconds 1

Results & Evaluation

The following charts illustrate the Throughput Speed-Up relative to the single-threaded parallel baselines. These results reflect the corrected evaluation of our implementation.

Experimental Setup:

  • Duration: 5 seconds per run.
  • Threads: Exponential steps (2^0 to 2^6).
  • Workloads:
    • Read-Heavy: 10% Insert, 10% Remove, 80% Contains.
    • Balanced: 40% Insert, 40% Remove, 20% Contains.

1. Coarse-Grained Locking

Coarse Grained Results

Observation: The coarse-grained implementation suffers from significant contention. The speed-up peaks at approximately 2.0x with 4 threads and then degrades as thread count increases, regardless of whether key ranges are common or disjoint.

2. Fine-Grained Locking

Fine Grained Results

Observation: Fine-grained locking demonstrates much better scalability. For the disjoint, read-heavy workload (green line), we observe a speed-up of approximately 45x at 40 threads. However, overhead becomes apparent at very high thread counts (2^6).

3. Lock-Free

Lockfree Results

Observation: The Lock-Free implementation achieves the highest scalability. In the disjoint, read-heavy scenario (green line), it reaches a speed-up of over 50x. Even in high-contention scenarios (common key range), it maintains an upward trend, outperforming the lock-based approaches.


References

  1. Herlihy, M., & Shavit, N. (2020). The Art of Multiprocessor Programming. Newnes.
  2. Pugh, W. (1990). Skip Lists: A Probabilistic Alternative to Balanced Trees. Communications of the ACM.
  3. OpenMP Architecture Review Board. (2015). OpenMP Application Program Interface Version 4.5.

About

High-performance C++ implementations of concurrent skip lists (sequential, coarse-grained, fine-grained, and lock-free) featuring a comprehensive benchmarking suite and performance visualization tools.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •