A fast, single-threaded limit order book written in modern C++23.
This project started as a playground to explore how far you can push a matching engine on a single core using modern C++ and cache-friendly data structures.
Not for production usage!
- realistic data structures, not some
bit-ultra-fast-hashmapsthat is not readable - as much as I can to get constant access and good cache locality
- experiment, compare with different DS&A and choose the one here that the most optimal
So think about it as the blueprint for potential production usage.
- Add Order: O(log M) for new price level, O(1) for existing level
- Cancel Order: O(1) via direct iterator access
- Match Order: O(K) where K = orders matched
- Best Bid/Ask: O(1) via map.begin()
- Modify Order: O(1) for quantity increase
M = number of active price levels, K = orders to match
- Limit orders
- Market orders
- FIFO price-time priority
- Partial fills
- Order cancellation in O(1)
- Unit tests
- IOC / FOK / GTC
- Concurrency support
- Stop orders
- Order modification
- Market data queries
- Iceberg orders
- Self-trade prevention
- Visualization
Requirements:
- C++23 compiler (clang++ 15+ or g++ 12+)
- CMake 3.27+
Quick start:
make runResults from ./build/benchmark:
| Operation | Throughput | Latency |
|---|---|---|
| Insert | ~11M ops/sec | 90ns |
| Cancel | ~93M ops/sec | 11ns |
| Match | ~48M ops/sec | 21ns |
| Best bid/ask queries | ~885M ops/sec | 1.2ns |
It measures avg time per single operations with all the overheads (allocation, lookups).
Latency (ns/op) = (Total Time in ms × 1,000,000) / Number of Operations
Example:
- 100,000 insertions in 8.94ms
- Latency = (8.94 × 1,000,000) / 100,000 = 89.4 ns/op
- price levels:
std::map<Price, OrderList>(separate for bids/asks) - order queue:
std::list<Order>per price level (FIFO) - order lookup:
std::unordered_map<OrderId, iterator>
- Flat array + bitset - O(1) access, best for bounded prices (equities, futures)
std::flat_map- sorted vector, better cache locality than tree- AVL tree - strictly balanced, faster lookups than red-black tree, slower inserts
- Skip list - probabilistic balancing, easier implementation, similar O(log M)
- Min/Max heap** - O(1) best bid/ask, O(log M) insert/delete, poor for cancellation
Boost.Intrusive list- zero allocation overhead, 20-50% fasterstd::deque- better cache locality but invalidates iterators on resize- Custom ring buffer - fixed size, pre-allocated, excellent cache performance
std::vector- best cache locality, but O(n) erase (not suitable for FIFO)
Few words about std::shared_ptr. In a lot of implementations it is used but I have decidede to use std::pmr:
- poor cache locality (objects scattered across heap)
- false ownership: orderbook has single owner, not shared
- atomic operations on every copy/destroy
- two heap allocations per order (object, control block)
- global mutex is easy but poor scalability, single bottleenck, a huge overhead
- RW Lock, writes are still slow
- thread per symbol
- SPSC Queue