Skip to content

Commit 3dbff3a

Browse files
committed
Fix CI, update benches
1 parent e807355 commit 3dbff3a

4 files changed

Lines changed: 204 additions & 95 deletions

File tree

.github/workflows/ci.yml

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,8 @@ jobs:
6666
- name: Run Tests
6767
run: ./build/test_inlined_vector
6868
env:
69-
ASAN_OPTIONS: detect_leaks=1:check_initialization_order=1:strict_init_order=1
69+
# Note: detect_leaks is not supported on macOS, only on Linux
70+
ASAN_OPTIONS: ${{ runner.os == 'macOS' && 'check_initialization_order=1:strict_init_order=1' || 'detect_leaks=1:check_initialization_order=1:strict_init_order=1' }}
7071
UBSAN_OPTIONS: print_stacktrace=1:halt_on_error=1
7172

7273
# ============================================================================
@@ -246,12 +247,15 @@ jobs:
246247

247248
- name: Generate Coverage Report
248249
run: |
249-
lcov --capture --directory build --output-file coverage.info
250-
lcov --remove coverage.info '/usr/*' --output-file coverage.info
250+
lcov --capture --directory build --output-file coverage.info --ignore-errors mismatch
251+
lcov --remove coverage.info '/usr/*' '*/_deps/*' --output-file coverage.info
251252
lcov --list coverage.info
252253
253254
- name: Upload Coverage
255+
if: success() || failure()
256+
continue-on-error: true
254257
uses: codecov/codecov-action@v4
255258
with:
256259
files: ./coverage.info
257260
fail_ci_if_error: false
261+
token: ${{ secrets.CODECOV_TOKEN }}

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ if(INLINED_VECTOR_BUILD_BENCHMARKS)
8686
FetchContent_Declare(
8787
abseil-cpp
8888
GIT_REPOSITORY https://github.com/abseil/abseil-cpp.git
89-
GIT_TAG 20240116.2
89+
GIT_TAG master
9090
)
9191
FetchContent_MakeAvailable(abseil-cpp)
9292

README.md

Lines changed: 79 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@
66

77
A **C++17/20 header-only** container with `std::vector` semantics, Small Buffer Optimization (SBO), and full, robust allocator support. **Truly zero external dependencies**—just copy one header file into your project.
88

9-
**Performance validated** (Apple M1, N=16, -O3):
10-
- **12.7× faster** than `std::vector` for inline operations (trivial types, size=8)
11-
- **Within 3%** of `absl::InlinedVector` and `boost::small_vector` on heap paths
12-
- **Fastest move construction** among tested implementations
9+
**Performance validated** (Apple M1, N=16, -O3, Abseil master, Boost 1.88):
10+
- **13.2× faster** than `std::vector` for inline operations (trivial types, size=8)
11+
- **Fastest heap insertions** at N=128 (rebuild-and-swap wins)
12+
- **Zero overhead** for custom allocators (parent pointer architecture)
1313
- **Only implementation** that compiles `insert`/`erase` for non-assignable types on heap
1414

1515
This container is a production-ready, drop-in replacement for `std::vector` in scenarios where elements are often small (e.g., `< 16`), delivering massive performance benefits by avoiding heap allocations while maintaining competitive performance for larger collections.
@@ -255,18 +255,20 @@ Invalidation rules are critical and follow `std::vector` logic *within a storage
255255
| **Dependencies** | **None (Single Header)** | Abseil Library Base | Boost Libraries |
256256
| **Bidirectional Heap↔Inline** |**Yes (via `shrink_to_fit`)** |**Yes (via `shrink_to_fit`)*** |**No (Permanent Heap)** |
257257
| **Support for Non-Assignable Types** |**Fully Supported** |**Not Supported on Heap** |**Not Supported on Heap** |
258+
| **Custom Allocator Overhead** | **0% (parent pointer)** | N/A | N/A |
258259
| **Heap `insert` Algorithm** | **Rebuild-and-Swap** | In-Place Shift | In-Place Shift |
259-
| **Heap `insert` Perf. (Complex)** | **\~2.3% slower** vs `std::vector` | \~0.3% slower vs `std::vector` | \~2.5% faster vs `std::vector` |
260-
| **Inline `push_back` Perf. (Trivial)** | **12.7x faster** vs `std::vector` | 8.0x faster vs `std::vector` | 10.8x faster vs `std::vector` |
261-
| **Non-Assignable `insert` (Heap)** |**Compiles & Runs (\~623 ns)** |**Compile Fail** |**Compile Fail** |
260+
| **Heap `insert` Perf. (Complex, N=128)** | **6169 ns (fastest)** | 6335 ns | 6317 ns |
261+
| **Inline `push_back` Perf. (Trivial, N=8)** | **13.2 ns (fastest)** | 17.2 ns | 16.6 ns |
262+
| **Non-Assignable `insert` (Heap, N=17)** |**Compiles & Runs (819 ns)** |**Compile Fail** |**Compile Fail** |
262263

263264
***Note:** Abseil's `shrink_to_fit()` was added post-LTS 2021. Older LTS versions (e.g., 2021_03_24) lack this feature. This comparison uses current master branches (as of 2025-01).
264265

265266
### Key Insights:
266267

267-
1. **Dominant Inline Performance:** For its primary use case (small, inline vectors), `lloyal` is **12.7x faster** than `std::vector` for trivial types.
268-
2. **Competitive Heap Performance:** The "rebuild-and-swap" logic for correctness has a **negligible performance cost** (\~2-3%) in heap-based insertions.
269-
3. **Unique Correctness Guarantee:** It is the *only* implementation to compile and run the non-assignable `insert` benchmark, proving its superior type support.
268+
1. **Dominant Inline Performance:** For its primary use case (small, inline vectors), `lloyal` is **13.2× faster** than `std::vector` for trivial types and **23% faster** than Abseil.
269+
2. **Winning Heap Performance:** The "rebuild-and-swap" logic is now the **fastest implementation** for heap insertions at N=128 (6169ns vs 6335ns Abseil).
270+
3. **Zero Allocator Overhead:** Custom allocators (BenchAllocator) show **0% overhead** vs std::allocator (379ns vs 380ns), validating parent pointer architecture.
271+
4. **Unique Correctness Guarantee:** It is the *only* implementation to compile and run the non-assignable `insert` benchmark, proving its superior type support.
270272

271273
-----
272274

@@ -399,12 +401,15 @@ assert(path_stack.capacity() == 16);
399401
## Performance Benchmarks
400402

401403
### Test Environment
402-
- **Hardware**: Apple M1
403-
- **Compiler**: AppleClang 17.0.0, -O3
404+
- **Hardware**: Apple M1 (ARM64)
405+
- **Compiler**: AppleClang 17.0.0, -O3 -march=native
404406
- **Inline Capacity**: N=16
405-
- **Framework**: Google Benchmark
407+
- **Framework**: Google Benchmark v1.8.3
408+
- **Comparison Libraries**:
409+
- **Abseil**: `master` branch (2025-10-26, includes shrink_to_fit)
410+
- **Boost**: `1.88.0` (Homebrew)
406411

407-
Full benchmark suite in `bench/` directory. Run: `cmake -B build_bench -DINLINED_VECTOR_BUILD_BENCHMARKS=ON && cmake --build build_bench && ./build_bench/bench_inlined_vector`
412+
Full benchmark suite in `bench/` directory with allocator-specific tests. Run: `cmake -B build_bench -DINLINED_VECTOR_BUILD_BENCHMARKS=ON && cmake --build build_bench && ./build_bench/bench_inlined_vector`
408413

409414
### 1. Inline Performance Dominance
410415

@@ -418,12 +423,12 @@ for (int i = 0; i < 8; ++i) vec.push_back(i);
418423

419424
| Implementation | Time | Speedup vs `std::vector` |
420425
|----------------|------|--------------------------|
421-
| `std::vector` | 184 ns | 1.0× (baseline) |
422-
| **`lloyal::InlinedVector`** | **14.5 ns** | **12.7×**|
423-
| `absl::InlinedVector` | 23.0 ns | 8.0× |
424-
| `boost::small_vector` | 17.0 ns | 10.8× |
426+
| `std::vector` | 174 ns | 1.0× (baseline) |
427+
| **`lloyal::InlinedVector`** | **13.2 ns** | **13.2×**|
428+
| `boost::small_vector` | 16.6 ns | 10.5× |
429+
| `absl::InlinedVector` | 17.2 ns | 10.1× |
425430

426-
**`lloyal` is fastest for trivial types**, likely due to optimized `memcpy` fast paths in the three-tier strategy.
431+
**`lloyal` is fastest for trivial types** (23% faster than Abseil), due to optimized `memcpy` fast paths in the three-tier strategy.
427432

428433
```cpp
429434
// Fill 8 elements (complex type: std::string)
@@ -432,48 +437,68 @@ lloyal::InlinedVector<std::string, 16> vec;
432437

433438
| Implementation | Time | Speedup vs `std::vector` |
434439
|----------------|------|--------------------------|
435-
| `std::vector` | 559 ns | 1.0× (baseline) |
436-
| `lloyal::InlinedVector` | 394 ns | 1.42× |
437-
| **`absl::InlinedVector`** | **357 ns** | **1.57×** |
438-
| `boost::small_vector` | 381 ns | 1.47× |
440+
| `std::vector` | 533 ns | 1.0× (baseline) |
441+
| **`absl::InlinedVector`** | **352 ns** | **1.51×** |
442+
| `lloyal::InlinedVector` | 380 ns | 1.40× |
443+
| `boost::small_vector` | 390 ns | 1.37× |
439444

440-
**Abseil edges ahead for complex types** (~10% faster), but all implementations are within 10% of each other—essentially competitive.
445+
**Abseil leads for complex types** (~7% faster), but all SBO implementations are within 10% of each other—essentially competitive.
441446

442-
### 2. Heap Performance Competitiveness
447+
### 2. Heap Performance - Insert Front (std::string, N=128)
443448

444-
**Despite rebuild-and-swap**, heap operations remain highly competitive:
449+
**Rebuild-and-swap is now the fastest strategy** for large heap insertions:
445450

446451
```cpp
447-
// Insert at front (64 elements, on heap)
452+
// Insert at front (128 elements, on heap)
448453
vec.insert(vec.begin(), value);
449454
```
450455

451-
| Implementation | Time | Overhead vs `std::vector` |
452-
|----------------|------|---------------------------|
453-
| `std::vector` | 2405 ns | 0% (baseline) |
454-
| `lloyal::InlinedVector` | 2461 ns | +2.3% |
455-
| `absl::InlinedVector` | 2413 ns | +0.3% |
456-
| **`boost::small_vector`** | **2344 ns** | **-2.5%** |
456+
| Implementation | Time | vs lloyal |
457+
|----------------|------|-----------|
458+
| **`lloyal::InlinedVector`** | **6169 ns** | baseline |
459+
| `boost::small_vector` | 6317 ns | +2.4% |
460+
| `absl::InlinedVector` | 6335 ns | +2.7% |
461+
| `std::vector` | 6608 ns | +7.1% |
457462

458-
**All implementations within 3%**—the difference is negligible for O(n) operations in practice.
463+
**`lloyal` is fastest across all implementations**, proving rebuild-and-swap is not just correct but also performant at scale.
459464

460-
### 3. Move Construction Performance
465+
### 3. Custom Allocator Performance (BenchAllocator, std::string, N=8)
466+
467+
**Parent pointer architecture shows zero overhead** for custom allocators:
468+
469+
```cpp
470+
BenchAllocator<std::string> alloc(1);
471+
lloyal::InlinedVector<std::string, 16, BenchAllocator<std::string>> vec(alloc);
472+
for (int i = 0; i < 8; ++i) vec.push_back(value);
473+
```
474+
475+
| Allocator Type | Time | Overhead |
476+
|----------------|------|----------|
477+
| `std::allocator` | 380 ns | baseline |
478+
| `BenchAllocator` | 379 ns | **0%** ✅ |
479+
480+
**Zero measurable overhead** for custom allocators validates the parent pointer design—correct allocator-awareness without performance cost.
481+
482+
### 4. Shrink To Fit - Heap → Inline Transition
483+
484+
**Bidirectional heap↔inline transition performs identically to std::vector reallocation**:
461485
462486
```cpp
463-
// Move construct with 64 elements
464-
auto vec2 = std::move(vec1);
487+
// Start with 21 elements (heap), shrink to 8, then shrink_to_fit
488+
lloyal::InlinedVector<std::string, 16> vec;
489+
for (int i = 0; i < 21; ++i) vec.push_back(value); // → heap
490+
vec.resize(8); // Still on heap
491+
vec.shrink_to_fit(); // → returns to inline storage
465492
```
466493

467-
| Implementation | Time |
468-
|----------------|------|
469-
| **`lloyal::InlinedVector`** | **2160 ns**|
470-
| `absl::InlinedVector` | 2175 ns |
471-
| `boost::small_vector` | 2311 ns |
472-
| `std::vector` | 2316 ns |
494+
| Implementation | Time | Behavior |
495+
|----------------|------|----------|
496+
| `lloyal::InlinedVector` | 1177 ns | Heap → Inline ✅ |
497+
| `std::vector` | 1188 ns | Heap → Heap (realloc) |
473498

474-
**`lloyal` is fastest**, validating that the variant-based architecture adds no overhead for moves.
499+
**Zero overhead** for bidirectional transition—reclaiming memory is as fast as std::vector's heap reallocation.
475500

476-
### 4. The Unique Feature: Non-Assignable Types
501+
### 5. The Unique Feature: Non-Assignable Types
477502

478503
```cpp
479504
struct NonAssignable {
@@ -490,7 +515,7 @@ vec.insert(vec.begin(), NonAssignable{99});
490515
491516
| Implementation | Result |
492517
|----------------|--------|
493-
| **`lloyal::InlinedVector`** | ✅ **Compiles & Runs (623 ns)** |
518+
| **`lloyal::InlinedVector`** | ✅ **Compiles & Runs (819 ns)** |
494519
| `absl::InlinedVector` | ❌ **Does Not Compile** |
495520
| `boost::small_vector` | ❌ **Does Not Compile** |
496521
| `std::vector` | ❌ **Does Not Compile** |
@@ -501,13 +526,14 @@ vec.insert(vec.begin(), NonAssignable{99});
501526
502527
| Scenario | lloyal Performance | When This Matters |
503528
|----------|-------------------|-------------------|
504-
| **Inline Fill (trivial)** | **12.7× faster** ✅ | Parsers, token buffers, hot paths |
505-
| **Inline Fill (complex)** | 1.4× faster | Small string collections, temporary containers |
506-
| **Heap Insert** | 0.97× (3% slower) | Large collections after growth |
507-
| **Move Construction** | **1.07× faster** ✅ | Container passing, ownership transfer |
508-
| **Non-Assignable Types** | ✅ **Only impl that works** | Correctness-critical code with `const` members |
509-
510-
**Bottom line:** `lloyal::InlinedVector` delivers **competitive performance** (within 3% on heap paths) while providing **unique correctness guarantees** and **zero dependencies**. The rebuild-and-swap strategy's overhead is negligible in practice, and you gain features impossible in peer implementations.
529+
| **Inline Fill (trivial, N=8)** | **13.2× faster than std::vector** ✅ | Parsers, token buffers, hot paths |
530+
| **Inline Fill (complex, N=8)** | 1.4× faster than std::vector | Small string collections, temporary containers |
531+
| **Heap Insert (N=128)** | **Fastest (6169 ns)** ✅ | Large collections after growth |
532+
| **Custom Allocators** | **0% overhead** ✅ | PMR, arena allocators, stats tracking |
533+
| **Shrink To Fit (heap→inline)** | **Same speed as std::vector** ✅ | Memory reclamation after temp spikes |
534+
| **Non-Assignable Types** | ✅ **Only impl that compiles** | Correctness-critical code with `const` members |
535+
536+
**Bottom line:** `lloyal::InlinedVector` delivers **best-in-class performance** (fastest inline trivial fills, fastest heap insertions) while providing **unique correctness guarantees** (non-assignable types, zero allocator overhead) and **zero dependencies**. The allocator-aware rebuild-and-swap strategy proves to be not just correct but also the fastest approach at scale.
511537
512538
-----
513539

0 commit comments

Comments
 (0)