-
Notifications
You must be signed in to change notification settings - Fork 2
Benchmark Analysis
The performance of cl-freelock has been measured in a controlled environment on a local development machine to establish a clear performance profile. The results reveal the library's high performance and demonstrate how different queue designs affect throughput and memory efficiency under concurrent load.
These benchmarks represent a real-world scenario on a modern laptop.
-
Processor: Intel(R) Core(TM) Ultra 7 155H (16 Cores, 22 Logical Processors)
-
Memory: 32.0 GB RAM
-
Execution Environment: Arch Linux on WSL2 (Host OS: Windows 11 Home)
-
Power State: Plugged in
This table directly compares the lock-free bounded queue against traditional lock-based approaches on a multi-core machine.
| Benchmark (1M items) | Throughput (ops/sec) | vs. Lock-Based | vs. oconnore/queues |
|---|---|---|---|
| cl-freelock (1P/1C) | ~3.8M | 1.7x faster | 1.7x faster |
| Lock-based list (1P/1C) | ~2.2M | - | - |
| oconnore/queues (1P/1C) | ~2.2M | - | - |
| cl-freelock (4P/4C) | ~2.9M | 1.5x faster | 2.9x faster |
| Lock-based list (4P/4C) | ~2.0M | - | - |
| oconnore/queues (4P/4C) | ~1.0M | - | - |

On a machine with many cores, cl-freelock is the clear winner. It is significantly faster than both lock-based and other lock-free queues. As contention increases to 4 producers and 4 consumers, the performance gap widens dramatically, with cl-freelock becoming 2.9x faster than oconnore/queues. This clearly displays the superiority of its lock-free algorithm for scalability on multi-core hardware.
The specialized APIs offer another tier of performance for specific use cases.
| Benchmark | Throughput (ops/sec) | Key Takeaway |
|---|---|---|
| SPSC Queue (1P/1C) | ~7.2M | ~44% faster than the general-purpose MPMC queue for this use case. |
| Bounded Queue (Batch of 64, 8P/8C) | ~34.1M | An incredible order-of-magnitude speedup for bulk operations. |

This compile-time flag strips out multi-threaded safety features to generate more efficient code for single-threaded use cases.
| Benchmark (-st mode) | Throughput (ops/sec) | Comparison to Default MT |
|---|---|---|
| SPSC Queue (1P/1C) | ~7.2M | The MT version is slightly faster, likely due to compiler specifics. |
| Bounded Queue (1P/1C, Batch of 64) | ~14.6M | ~71% faster than the default multi-threaded build. |
The benchmarks confirm that the feature flag provides a significant performance boost for the batching API. The performance of the batching API in this mode, reaching over 14 million operations per second, shows just how fast pure Common Lisp can be when the right algorithms are used.
Excessive memory allocation leads to increased garbage collection (GC) pressure, causing performance-killing pauses. cl-freelock is designed to be exceptionally memory-efficient.
| Queue Implementation (1P/1C) | Memory Allocated (1M items) |
|---|---|
| cl-freelock (SPSC) | ~0.13 MB |
| cl-freelock (Bounded) | ~0.13 MB |
| cl-freelock (Unbounded) | ~9.70 MB |
| oconnore/queues | ~10.06 MB |
| Lock-Based Queue | ~16.15 MB |

The specialized SPSC and Bounded queues are in a class of their own, allocating over 75 times less memory than the next best competitor. This is a great option for projects where low latency and predictable performance are needed.
All benchmark data is available in our dedicated dataset branch:
- CSV File: benchmark_results.csv
- Raw Data Branch: dataset