Skip to content

Benchmark Analysis

io.shift edited this page Aug 6, 2025 · 4 revisions

The performance of cl-freelock has been measured in a controlled environment on a local development machine to establish a clear performance profile. The results reveal the library's high performance and demonstrate how different queue designs affect throughput and memory efficiency under concurrent load.


Local Development Machine (Multi-Core)

These benchmarks represent a real-world scenario on a modern laptop.

Test Environment

  • Processor: Intel(R) Core(TM) Ultra 7 155H (16 Cores, 22 Logical Processors)

  • Memory: 32.0 GB RAM

  • Execution Environment: Arch Linux on WSL2 (Host OS: Windows 11 Home)

  • Power State: Plugged in

Competitor Comparison (Bounded Queue)

This table directly compares the lock-free bounded queue against traditional lock-based approaches on a multi-core machine.

Benchmark (1M items) Throughput (ops/sec) vs. Lock-Based vs. oconnore/queues
cl-freelock (1P/1C) ~3.8M 1.7x faster 1.7x faster
Lock-based list (1P/1C) ~2.2M - -
oconnore/queues (1P/1C) ~2.2M - -
cl-freelock (4P/4C) ~2.9M 1.5x faster 2.9x faster
Lock-based list (4P/4C) ~2.0M - -
oconnore/queues (4P/4C) ~1.0M - -


mpmc_scalability_comparison Figure 1. MPMC scalability comparison Intel(R) Core(TM) Ultra 7 155H (16 Cores, 22 Logical Processors)

On a machine with many cores, cl-freelock is the clear winner. It is significantly faster than both lock-based and other lock-free queues. As contention increases to 4 producers and 4 consumers, the performance gap widens dramatically, with cl-freelock becoming 2.9x faster than oconnore/queues. This clearly displays the superiority of its lock-free algorithm for scalability on multi-core hardware.

Specialized API Performance

The specialized APIs offer another tier of performance for specific use cases.

Benchmark Throughput (ops/sec) Key Takeaway
SPSC Queue (1P/1C) ~7.2M ~44% faster than the general-purpose MPMC queue for this use case.
Bounded Queue (Batch of 64, 8P/8C) ~34.1M An incredible order-of-magnitude speedup for bulk operations.

bounded_queue_comparison Figure 2. A visual comparison of bounded queue Intel(R) Core(TM) Ultra 7 155H (16 Cores, 22 Logical Processors)

Single-Threaded Optimization (:cl-freelock-single-threaded)

This compile-time flag strips out multi-threaded safety features to generate more efficient code for single-threaded use cases.

Benchmark (-st mode) Throughput (ops/sec) Comparison to Default MT
SPSC Queue (1P/1C) ~7.2M The MT version is slightly faster, likely due to compiler specifics.
Bounded Queue (1P/1C, Batch of 64) ~14.6M ~71% faster than the default multi-threaded build.

The benchmarks confirm that the feature flag provides a significant performance boost for the batching API. The performance of the batching API in this mode, reaching over 14 million operations per second, shows just how fast pure Common Lisp can be when the right algorithms are used.

Garbage Collection

Excessive memory allocation leads to increased garbage collection (GC) pressure, causing performance-killing pauses. cl-freelock is designed to be exceptionally memory-efficient.

Queue Implementation (1P/1C) Memory Allocated (1M items)
cl-freelock (SPSC) ~0.13 MB
cl-freelock (Bounded) ~0.13 MB
cl-freelock (Unbounded) ~9.70 MB
oconnore/queues ~10.06 MB
Lock-Based Queue ~16.15 MB

gc_pressure_comparison Figure 3. A visual comparison of GC pressure Intel(R) Core(TM) Ultra 7 155H (16 Cores, 22 Logical Processors)

The specialized SPSC and Bounded queues are in a class of their own, allocating over 75 times less memory than the next best competitor. This is a great option for projects where low latency and predictable performance are needed.


All benchmark data is available in our dedicated dataset branch: