Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
223 commits
Select commit Hold shift + click to select a range
a4a2c7a
tier-first search order in get() for better cache locality
joshuaisaact Mar 23, 2026
25c5de2
Revert "tier-first search order in get() for better cache locality"
joshuaisaact Mar 23, 2026
330ddb6
cross-tier prefetching in get() to hide inter-tier latency
joshuaisaact Mar 23, 2026
d5d671a
Revert "cross-tier prefetching in get() to hide inter-tier latency"
joshuaisaact Mar 23, 2026
f9fc247
remove all prefetching from get() to test if it helps or hurts
joshuaisaact Mar 23, 2026
18b4ce8
add tier-0 probe-0 fast path in get()
joshuaisaact Mar 23, 2026
446e450
Revert "add tier-0 probe-0 fast path in get()"
joshuaisaact Mar 23, 2026
b48b0a3
reduce MAX_PROBES from 32 to 24
joshuaisaact Mar 23, 2026
6bbdd09
reduce MAX_PROBES from 24 to 20
joshuaisaact Mar 23, 2026
6ca9ddc
cap get() tier search to 8 tiers
joshuaisaact Mar 23, 2026
d42e9ee
reduce MAX_LOOKUP_TIERS from 8 to 6
joshuaisaact Mar 23, 2026
33c9feb
Revert "reduce MAX_LOOKUP_TIERS from 8 to 6"
joshuaisaact Mar 23, 2026
9530f6c
use fixed-size arrays for tier metadata instead of heap slices
joshuaisaact Mar 23, 2026
1c52805
Revert "use fixed-size arrays for tier metadata instead of heap slices"
joshuaisaact Mar 23, 2026
907f4f8
switch hash to Stafford variant 13 (splitmix64 finalizer)
joshuaisaact Mar 23, 2026
a8629ca
Revert "switch hash to Stafford variant 13 (splitmix64 finalizer)"
joshuaisaact Mar 23, 2026
bbf67f0
switch to linear probing for better cache locality
joshuaisaact Mar 23, 2026
6e5b3dc
add early exit on empty bucket slots in get()
joshuaisaact Mar 23, 2026
e580524
Revert "add early exit on empty bucket slots in get()"
joshuaisaact Mar 23, 2026
5f62379
combined fp+empty SIMD check with per-tier early exit
joshuaisaact Mar 23, 2026
ac8b602
Revert "combined fp+empty SIMD check with per-tier early exit"
joshuaisaact Mar 23, 2026
6f57a0a
double tier 0 size to concentrate elements for faster lookup
joshuaisaact Mar 23, 2026
f7328f9
quadruple tier 0 size (2x capacity / BUCKET_SIZE)
joshuaisaact Mar 23, 2026
be1c046
Revert "quadruple tier 0 size (2x capacity / BUCKET_SIZE)"
joshuaisaact Mar 23, 2026
710938e
reduce MAX_LOOKUP_TIERS from 8 to 4 with larger tier 0
joshuaisaact Mar 23, 2026
3d427e3
reduce MAX_LOOKUP_TIERS from 4 to 3
joshuaisaact Mar 23, 2026
6188a77
reduce MAX_LOOKUP_TIERS from 3 to 2
joshuaisaact Mar 23, 2026
c9ea0fe
reduce MAX_LOOKUP_TIERS to 1 (tier 0 only)
joshuaisaact Mar 23, 2026
8c9b749
simplify get() to single tier 0 loop, no tier iteration
joshuaisaact Mar 23, 2026
eaf5bbe
Revert "simplify get() to single tier 0 loop, no tier iteration"
joshuaisaact Mar 23, 2026
a4c6c90
use inline for to unroll probe loop in get()
joshuaisaact Mar 23, 2026
db48705
Revert "use inline for to unroll probe loop in get()"
joshuaisaact Mar 23, 2026
4841fba
reduce MAX_PROBES from 20 to 10
joshuaisaact Mar 23, 2026
4260bd4
reduce MAX_PROBES from 10 to 8
joshuaisaact Mar 23, 2026
004b0b7
reduce MAX_PROBES from 8 to 6
joshuaisaact Mar 23, 2026
03b078a
Revert "reduce MAX_PROBES from 8 to 6"
joshuaisaact Mar 23, 2026
2f6f0b8
increase BUCKET_SIZE to 32 for AVX2, generic mask types
joshuaisaact Mar 23, 2026
94cd8b0
Revert "increase BUCKET_SIZE to 32 for AVX2, generic mask types"
joshuaisaact Mar 23, 2026
784e8ca
try quadratic probing instead of linear
joshuaisaact Mar 23, 2026
eb3837d
Revert "try quadratic probing instead of linear"
joshuaisaact Mar 23, 2026
6f15cd3
inline for unroll 8-probe loop and eliminate tier loop
joshuaisaact Mar 23, 2026
ab64ffb
Revert "inline for unroll 8-probe loop and eliminate tier loop"
joshuaisaact Mar 23, 2026
dd36b35
simplify get() to tier-0-only with comptime MAX_PROBES loop bound
joshuaisaact Mar 23, 2026
e070ea9
remove insert prefetch
joshuaisaact Mar 23, 2026
9ab9cc0
delay batch transition to 90% fill (was 75%) to keep more in tier 0
joshuaisaact Mar 23, 2026
ace8f81
limit insertIntoTier probe depth to MAX_PROBES for lookup alignment
joshuaisaact Mar 23, 2026
4a423da
try 4x tier 0 capacity with probe-limited insert
joshuaisaact Mar 23, 2026
dd35c71
Revert "try 4x tier 0 capacity with probe-limited insert"
joshuaisaact Mar 23, 2026
bd6cd0b
tighten batch threshold to 0.08 (92% fill)
joshuaisaact Mar 23, 2026
c6a1a44
optimize remove() to tier-0-only with comptime loop bound
joshuaisaact Mar 23, 2026
5b88308
try faster 64-bit multiply hash
joshuaisaact Mar 23, 2026
56af0b9
try single-multiply hash
joshuaisaact Mar 23, 2026
97c8fa3
noinline findKeyInBucket to keep hot path tight
joshuaisaact Mar 23, 2026
49e1c9c
Revert "noinline findKeyInBucket to keep hot path tight"
joshuaisaact Mar 23, 2026
532520f
fast path for single FP match in findKeyInBucket
joshuaisaact Mar 23, 2026
2ae4de7
cache tier0 metadata in struct fields for faster get/remove
joshuaisaact Mar 23, 2026
022c478
hardcode tier0_start=0, eliminate addition in get/remove
joshuaisaact Mar 23, 2026
6f82228
remove unused tier0_start field
joshuaisaact Mar 23, 2026
7ab6486
try MAX_PROBES=6 again with all other optimizations
joshuaisaact Mar 23, 2026
2ce441c
Revert "try MAX_PROBES=6 again with all other optimizations"
joshuaisaact Mar 23, 2026
c9e700c
cache bucket mask directly, eliminate subtraction per probe
joshuaisaact Mar 23, 2026
740bc74
prefetch keys for probe 0 to hide key read latency
joshuaisaact Mar 23, 2026
1e118e7
Revert "prefetch keys for probe 0 to hide key read latency"
joshuaisaact Mar 23, 2026
bfd53c3
branchless fingerprint using 7 bits + 1 (range 1-128)
joshuaisaact Mar 23, 2026
2200af5
Revert "branchless fingerprint using 7 bits + 1 (range 1-128)"
joshuaisaact Mar 23, 2026
23f2a59
reorder struct fields: hot path first for cache line alignment
joshuaisaact Mar 23, 2026
37c7627
try inline for with precomputed probe array
joshuaisaact Mar 23, 2026
f6b581a
Revert "try inline for with precomputed probe array"
joshuaisaact Mar 23, 2026
90f051a
use bits 32-39 for fingerprint, less correlation with bucket index
joshuaisaact Mar 23, 2026
45de2ab
try bits 24-31 for fingerprint
joshuaisaact Mar 23, 2026
b3fdcef
ultra-fast path: check slot 0 with validity check
joshuaisaact Mar 23, 2026
e4a3802
separate probe 0 check to avoid redundant work in main loop
joshuaisaact Mar 23, 2026
bb13814
unroll probes 0 and 1 before the loop
joshuaisaact Mar 23, 2026
4718a34
Revert "unroll probes 0 and 1 before the loop"
joshuaisaact Mar 23, 2026
53ffd0a
prefetch values for probe 0 while SIMD runs
joshuaisaact Mar 23, 2026
ebb9f26
Revert "prefetch values for probe 0 while SIMD runs"
joshuaisaact Mar 23, 2026
e22f93b
branchless fingerprint using max/min clamp
joshuaisaact Mar 23, 2026
97389fa
Revert "branchless fingerprint using max/min clamp"
joshuaisaact Mar 23, 2026
6600b40
batch threshold 0.06 (94% fill)
joshuaisaact Mar 23, 2026
f0f51aa
Revert "batch threshold 0.06 (94% fill)"
joshuaisaact Mar 23, 2026
28eded4
batch threshold 0.12 (88% fill)
joshuaisaact Mar 23, 2026
4e2f019
batch threshold 0.15 (85% fill)
joshuaisaact Mar 23, 2026
f380498
Revert "batch threshold 0.15 (85% fill)"
joshuaisaact Mar 23, 2026
7aa8cd6
add experiment results, benchmark harness, and autoresearch artifacts
joshuaisaact Mar 23, 2026
1393871
add rigorous abseil comparison, correct stale README claims
joshuaisaact Mar 23, 2026
c886b94
add program-v2: optimization target is now abseil flat_hash_map
joshuaisaact Mar 23, 2026
34e40e3
add AGENTS.md with non-discoverable repo guidance
joshuaisaact Mar 23, 2026
7807ffe
interleave keys and values into entries array for cache locality
joshuaisaact Mar 23, 2026
236afac
reduce MAX_PROBES 8 to 7
joshuaisaact Mar 23, 2026
0675665
reduce MAX_PROBES 7 to 6
joshuaisaact Mar 23, 2026
984921d
Revert "reduce MAX_PROBES 7 to 6"
joshuaisaact Mar 23, 2026
7b36a0a
two-round multiply hash for better fingerprint distribution
joshuaisaact Mar 23, 2026
83d89ba
Revert "two-round multiply hash for better fingerprint distribution"
joshuaisaact Mar 23, 2026
fb76b19
use bits 56-63 for fingerprint (highest byte, most independent)
joshuaisaact Mar 23, 2026
fbfbbf2
Revert "use bits 56-63 for fingerprint (highest byte, most independent)"
joshuaisaact Mar 23, 2026
fce62d7
batch threshold 0.08 (92% fill before tier 1)
joshuaisaact Mar 23, 2026
4cb90c7
Revert "batch threshold 0.08 (92% fill before tier 1)"
joshuaisaact Mar 23, 2026
93e4a08
early termination in get() on empty fingerprint slots
joshuaisaact Mar 23, 2026
f21570b
Revert "early termination in get() on empty fingerprint slots"
joshuaisaact Mar 23, 2026
2cf2a29
return value directly from findValueInBucket to avoid re-indexing
joshuaisaact Mar 23, 2026
749a1e9
mark get() as inline
joshuaisaact Mar 23, 2026
6fa6f37
Revert "mark get() as inline"
joshuaisaact Mar 23, 2026
68cb2f5
golden ratio hash constant 0x9E3779B97F4A7C15
joshuaisaact Mar 23, 2026
cedfe75
Revert "golden ratio hash constant 0x9E3779B97F4A7C15"
joshuaisaact Mar 23, 2026
46f1ae9
minimal early termination via matchEmpty after findValueInBucket miss
joshuaisaact Mar 23, 2026
6768f0c
Revert "minimal early termination via matchEmpty after findValueInBuc…
joshuaisaact Mar 23, 2026
c5b0caa
batch threshold 0.14 (86% fill before tier 1)
joshuaisaact Mar 23, 2026
50b619b
Revert "batch threshold 0.14 (86% fill before tier 1)"
joshuaisaact Mar 23, 2026
f97dad2
always try tier 0 first in insert to maximize get() hit rate
joshuaisaact Mar 23, 2026
5566394
Revert "always try tier 0 first in insert to maximize get() hit rate"
joshuaisaact Mar 23, 2026
8d00aa7
prefetch entries for probe 0 before fingerprint check
joshuaisaact Mar 23, 2026
ad3bf7c
also prefetch entries for probe 1 while checking probe 0
joshuaisaact Mar 23, 2026
5b39875
Revert "also prefetch entries for probe 1 while checking probe 0"
joshuaisaact Mar 23, 2026
4cc0deb
prefetch with locality=1 (low temporal) for entries
joshuaisaact Mar 23, 2026
97803b7
Revert "prefetch with locality=1 (low temporal) for entries"
joshuaisaact Mar 23, 2026
a355018
remove xor-shift from hash (just multiply)
joshuaisaact Mar 23, 2026
758f93f
use upper hash bits for bucket index (better distribution without xor…
joshuaisaact Mar 23, 2026
dd25176
merge probe 0 back into loop (prefetch before loop)
joshuaisaact Mar 23, 2026
713b666
reduce MAX_PROBES 7 to 6 (upper-bit hash has better distribution)
joshuaisaact Mar 23, 2026
e5a018e
Revert "reduce MAX_PROBES 7 to 6 (upper-bit hash has better distribut…
joshuaisaact Mar 23, 2026
56102de
branchless fingerprint clamping with @max/@min
joshuaisaact Mar 23, 2026
2722e1e
Revert "branchless fingerprint clamping with @max/@min"
joshuaisaact Mar 23, 2026
eba4f60
prefetch both fingerprints and entries for probe 0
joshuaisaact Mar 23, 2026
e431166
Revert "prefetch both fingerprints and entries for probe 0"
joshuaisaact Mar 23, 2026
248089c
use for range instead of while in get() probe loop
joshuaisaact Mar 23, 2026
d1db70b
inline for in get() probe loop (comptime unroll)
joshuaisaact Mar 23, 2026
927dc4a
Revert "inline for in get() probe loop (comptime unroll)"
joshuaisaact Mar 23, 2026
c51c9e9
batch threshold 0.10 (90% fill)
joshuaisaact Mar 23, 2026
c102884
Revert "batch threshold 0.10 (90% fill)"
joshuaisaact Mar 23, 2026
269d9c4
stride-3 probing to reduce secondary clustering
joshuaisaact Mar 23, 2026
d8c6338
Revert "stride-3 probing to reduce secondary clustering"
joshuaisaact Mar 23, 2026
56755e5
prefetch two cache lines of entries for probe 0
joshuaisaact Mar 23, 2026
ca489d3
Revert "prefetch two cache lines of entries for probe 0"
joshuaisaact Mar 23, 2026
df2c24a
try Murmur3 mixing constant 0xbf58476d1ce4e5b9
joshuaisaact Mar 23, 2026
21ed7db
Revert "try Murmur3 mixing constant 0xbf58476d1ce4e5b9"
joshuaisaact Mar 23, 2026
fa8f1ec
rigorous benchmark: random keys, fair capacity, median of 10, miss test
joshuaisaact Mar 23, 2026
573d1cf
abseil-style control encoding: empty=0x80, 7-bit fps, cheap early ter…
joshuaisaact Mar 23, 2026
76f7903
Revert "abseil-style control encoding: empty=0x80, 7-bit fps, cheap e…
joshuaisaact Mar 23, 2026
4c2dd16
early termination via matchEmpty (retry with random keys benchmark)
joshuaisaact Mar 23, 2026
456a5f0
Revert "early termination via matchEmpty (retry with random keys benc…
joshuaisaact Mar 23, 2026
4fe8fdb
move prefetch inside probe loop (abseil pattern: one per iteration)
joshuaisaact Mar 23, 2026
7a37536
Revert "move prefetch inside probe loop (abseil pattern: one per iter…
joshuaisaact Mar 23, 2026
5f0ba61
restore xor-shift in hash (may help random key distribution)
joshuaisaact Mar 23, 2026
3b49a27
xor-shift >> 28 for more upper-bit mixing
joshuaisaact Mar 23, 2026
86c8d0d
Revert "xor-shift >> 28 for more upper-bit mixing"
joshuaisaact Mar 23, 2026
7d2318e
add v2 experiment log and insights
joshuaisaact Mar 23, 2026
5e8eaeb
gitignore benchmark binaries and logs
joshuaisaact Mar 23, 2026
edf731b
independent fingerprint hash (second multiply, parallel on superscalar)
joshuaisaact Mar 23, 2026
7e51903
Revert "independent fingerprint hash (second multiply, parallel on su…
joshuaisaact Mar 23, 2026
a950f10
test 7-bit fingerprint from top bits (abseil H2 style)
joshuaisaact Mar 23, 2026
112a5c2
Revert "test 7-bit fingerprint from top bits (abseil H2 style)"
joshuaisaact Mar 23, 2026
b426a0c
update insights with honest benchmark findings
joshuaisaact Mar 23, 2026
b08912e
use findEmptyInBucket in insert (skip tombstone check during bulk ins…
joshuaisaact Mar 23, 2026
ac3797f
Revert "use findEmptyInBucket in insert (skip tombstone check during …
joshuaisaact Mar 23, 2026
35aff5d
paper-faithful multi-tier get(): search all tiers, not just tier 0
joshuaisaact Mar 23, 2026
178c3ad
limit multi-tier search to tier 0 + tier 1 only
joshuaisaact Mar 23, 2026
c5ae13d
tier 1: probe 0 only (minimal code, finds ~95% of tier-1 elements)
joshuaisaact Mar 23, 2026
1129bce
Revert "tier 1: probe 0 only (minimal code, finds ~95% of tier-1 elem…
joshuaisaact Mar 23, 2026
9188eba
Reapply "tier 1: probe 0 only (minimal code, finds ~95% of tier-1 ele…
joshuaisaact Mar 23, 2026
818f9bf
Revert "Reapply "tier 1: probe 0 only (minimal code, finds ~95% of ti…
joshuaisaact Mar 23, 2026
0df48a8
restore tier-0-only get() (multi-tier causes compiler cascading regre…
joshuaisaact Mar 23, 2026
d511474
update README with honest benchmark results and paper divergence notes
joshuaisaact Mar 23, 2026
aaa9168
add program-v3: paper-faithful multi-tier lookup research
joshuaisaact Mar 24, 2026
83f9649
noinline getSlowPath for tier 1+ search (keep get() I-cache footprint…
joshuaisaact Mar 24, 2026
b725080
limit getSlowPath to tier 1 only (tiers 2+ are empty at 99% load)
joshuaisaact Mar 24, 2026
65aee5b
Revert "limit getSlowPath to tier 1 only (tiers 2+ are empty at 99% l…
joshuaisaact Mar 24, 2026
48345e4
Reapply "limit getSlowPath to tier 1 only (tiers 2+ are empty at 99% …
joshuaisaact Mar 24, 2026
0c05a9b
restore tier-0-only get() as clean base for insert-side experiments
joshuaisaact Mar 24, 2026
2696554
MAX_PROBES 7 -> 8 (more elements fit in tier 0)
joshuaisaact Mar 24, 2026
93a99b8
batch threshold 0.05 (95% fill) with MAX_PROBES=8
joshuaisaact Mar 24, 2026
e5abd48
Revert "batch threshold 0.05 (95% fill) with MAX_PROBES=8"
joshuaisaact Mar 24, 2026
3a03edf
Reapply "batch threshold 0.05 (95% fill) with MAX_PROBES=8"
joshuaisaact Mar 24, 2026
c6f6467
restore clean v3 baseline (MAX_PROBES=7, threshold=0.12)
joshuaisaact Mar 24, 2026
7736f8b
paper-faithful get(): noinline cold-hinted getOtherTiers for tier 1+
joshuaisaact Mar 24, 2026
b2448b0
use opaque function pointer for tier 1+ overflow (prevent LLVM codege…
joshuaisaact Mar 24, 2026
906ae65
limit overflow to tier 1 only (tiers 2+ empty at 99% load)
joshuaisaact Mar 24, 2026
776cd39
log v3 experiments
joshuaisaact Mar 24, 2026
6f7d750
early termination in overflow function (safe with 100% find rate)
joshuaisaact Mar 24, 2026
32dd07a
early termination in tier-0 loop (jump to overflow on empty slot)
joshuaisaact Mar 24, 2026
9837b5d
Revert "early termination in tier-0 loop (jump to overflow on empty s…
joshuaisaact Mar 24, 2026
1c47413
log v3 early termination experiments
joshuaisaact Mar 24, 2026
94c1d05
v3 insights: elastic hash 40-65% faster than abseil at normal loads
joshuaisaact Mar 24, 2026
ad1bfed
add verification program: are these results real?
joshuaisaact Mar 24, 2026
ab25178
verification checks 1-3: capacity, shuffled access, size independence
joshuaisaact Mar 24, 2026
71f0fb9
complete verification: realistic workloads, hash cost, shuffled access
joshuaisaact Mar 24, 2026
7e7a711
update README and PR with verified benchmark results
joshuaisaact Mar 24, 2026
65683f7
verify: gcc vs clang abseil - no unfair compiler advantage
joshuaisaact Mar 24, 2026
f466836
string-key elastic hash: implementation, benchmarks, runner
joshuaisaact Mar 24, 2026
27303d9
string key verification: 36-97% faster than abseil even with shuffled…
joshuaisaact Mar 24, 2026
06ba5b9
comprehensive string key verification: advantage holds across lengths…
joshuaisaact Mar 24, 2026
d788da5
cross-language benchmark: elastic hash vs abseil, Rust hashbrown, Go …
joshuaisaact Mar 24, 2026
6aecf81
fair cross-language benchmark: ahash for Rust, pre-allocated strings …
joshuaisaact Mar 24, 2026
3be46ed
add M4 benchmark guide: test whether cache density advantage is x86-s…
joshuaisaact Mar 24, 2026
93444af
M4 benchmark: advantage grows to 2.59x (up from 1.74x on x86)
joshuaisaact Mar 24, 2026
a18b161
M4 cross-language size sweep: elastic hash wins 16K-4M against all co…
joshuaisaact Mar 24, 2026
8b93cff
add miss optimization research program
joshuaisaact Mar 24, 2026
5e4c283
branch-hinted matchEmpty: misses 4.6x faster, now beats abseil
joshuaisaact Mar 24, 2026
b98545c
final M4 results: elastic hash fastest on all operations at 50% load
joshuaisaact Mar 24, 2026
3701260
tombstone churn test: no degradation after 500K delete/insert cycles
joshuaisaact Mar 24, 2026
526d1e5
add miss optimization summary doc
joshuaisaact Mar 24, 2026
16c1b94
mixed workload benchmark: 2x faster than abseil under realistic churn
joshuaisaact Mar 24, 2026
7eca7b4
memory overhead: identical to abseil. key length: advantage grows wit…
joshuaisaact Mar 24, 2026
4a6dbfd
C++ port reveals: hit lookup advantage is mostly Zig codegen, not dat…
joshuaisaact Mar 25, 2026
2174943
add Rust port: confirms hit advantage is Zig codegen
joshuaisaact Mar 25, 2026
e8e987f
faithful C++ and Rust ports: raw pointers, NEON SIMD, full batch logic
joshuaisaact Mar 25, 2026
94d5daa
controlled experiment: probing and layout have minimal impact on lookups
joshuaisaact Mar 25, 2026
51284f2
fix controlled experiment: flat had 2x capacity, elastic had all-tier…
joshuaisaact Mar 25, 2026
5c64951
add comprehensive findings doc
joshuaisaact Mar 25, 2026
feb1655
growth policy test: resize check adds 37% insert overhead at 50% load
joshuaisaact Mar 25, 2026
e3beb44
fix duplicate keys and verify resize correctness
joshuaisaact Mar 25, 2026
f44dad2
update findings with growth policy overhead, Go results, and producti…
joshuaisaact Mar 25, 2026
b1f25ed
add contains, len, clear, getOrPut, iterator with tests
joshuaisaact Mar 25, 2026
2ace61a
single-pass insert: dedup + insert in one probe sequence
joshuaisaact Mar 25, 2026
303265f
fix max_probe_depth tracking in single-pass insert
joshuaisaact Mar 25, 2026
5589825
bloom filter experiment: not worth it, reverted from get()
joshuaisaact Mar 25, 2026
2fe9f29
generic ElasticHash: works with any key/value type
joshuaisaact Mar 25, 2026
d052a9c
remove Go benchmarks — not relevant for systems-level comparison
joshuaisaact Mar 27, 2026
53f8258
remove Go references from FINDINGS.md
joshuaisaact Mar 27, 2026
7f96daf
C++ size sweep: abseil 3.8x faster at 16K, elastic 2.9x faster at 256K
joshuaisaact Mar 27, 2026
d2a5e39
corrected benchmarks: clean sequential C++ runs at all sizes and loads
joshuaisaact Mar 27, 2026
215aced
adversarial disproval program: attack elastic hash performance claims
joshuaisaact Mar 28, 2026
d9529f2
adversarial disproval: the advantage is real but the name is wrong
joshuaisaact Mar 28, 2026
fa29ef6
robust latency verification: p99 hit latency 10-21x worse than abseil…
joshuaisaact Mar 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
.zig-cache
zig-out
.claude
NOTES.md
bench.log
bench-abseil
abseil-v2.log
elastic-v2.log
17 changes: 17 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# AGENTS.md

## Frozen files

`src/simple.zig` and `src/bench.zig` are reference implementations. Do not modify them.

## Autoresearch programs

`program.md` and `program-v2.md` are autonomous agent programs (not documentation). When asked to "start" or "run" one, read it fully and execute its loop. Each defines its own set of frozen files, editable files, and keep/revert criteria -- read before editing anything.

## Zig skills

The skills `zig-perf`, `zig-quality`, `zig-safety`, `zig-style`, and `zig-testing` are available globally.

## Abseil comparison benchmarks

The abseil benchmark (`bench-abseil.cpp`, created by program-v2) requires system-installed `abseil-cpp` with pkg-config modules: `absl_hash`, `absl_raw_hash_set`, `absl_hashtablez_sampler`.
170 changes: 170 additions & 0 deletions BENCHMARK-M4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# Running benchmarks on Apple Silicon M4

## What we're testing

On x86 with ~512KB L2, elastic hash beats abseil by 36-97% on string lookups because our tier-0 fingerprints (1MB) fit in L2 while abseil's control bytes (2MB) spill to L3.

M4 has ~16MB shared L2. Both arrays should fit in L2. If the advantage disappears, the result is cache-density-specific. If it persists, something deeper is happening.

## Setup

### Install dependencies

```bash
# Zig
brew install zig

# Abseil
brew install abseil

# Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Go
brew install go
```

### Clone and checkout

```bash
git clone https://github.com/joshuaisaact/elastic-hash.git
cd elastic-hash
git checkout autoresearch/cross-language-bench
```

### Build everything

```bash
# Abseil benchmark
# Note: pkg-config paths may differ on macOS. Try:
g++ -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \
bench-abseil-strings.cpp -o bench-abseil-strings \
$(pkg-config --cflags --libs absl_hash absl_raw_hash_set absl_hashtablez_sampler)

# If pkg-config doesn't work, try:
# g++ -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \
# bench-abseil-strings.cpp -o bench-abseil-strings \
# -I/opt/homebrew/include -L/opt/homebrew/lib \
# -labsl_hash -labsl_raw_hash_set -labsl_hashtablez_sampler \
# -labsl_city -labsl_low_level_hash -labsl_strings -labsl_int128 \
# -labsl_base -labsl_throw_delegate -labsl_raw_logging_internal

# Elastic hash (Zig)
zig build test # verify tests pass
zig build autobench-strings -Doptimize=ReleaseFast # just to check it builds

# Rust
cd bench-rust && cargo build --release && cd ..

# Go
cd bench-go && go build -o bench-go . && cd ..
```

## Run the benchmarks

### Quick test (just abseil vs elastic at 1M 50%)

```bash
bash bench-strings.sh
```

### Full cross-language comparison

Run each one and save the output:

```bash
# Abseil
./bench-abseil-strings > results-m4-abseil.log 2>/dev/null
cat results-m4-abseil.log

# Elastic hash
zig build autobench-strings -Doptimize=ReleaseFast 2> results-m4-elastic.log
cat results-m4-elastic.log

# Rust (with ahash)
./bench-rust/target/release/bench-hashbrown > results-m4-rust.log 2>/dev/null
cat results-m4-rust.log

# Go
./bench-go/bench-go > results-m4-go.log 2>/dev/null
cat results-m4-go.log
```

### Shuffled verification (the most important test)

```bash
# Abseil shuffled
g++ -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \
bench-strings-verify.cpp -o bench-strings-verify \
$(pkg-config --cflags --libs absl_hash absl_raw_hash_set absl_hashtablez_sampler)
./bench-strings-verify

# Elastic hash shuffled (swap autobench temporarily)
cp src/autobench.zig src/autobench.zig.bak
cp src/autobench-strings-verify.zig src/autobench.zig
zig build autobench -Doptimize=ReleaseFast 2>&1 | grep ELASTIC
cp src/autobench.zig.bak src/autobench.zig
rm src/autobench.zig.bak
```

## What to look for

### Prediction: advantage shrinks or disappears on M4

M4's ~16MB L2 fits both our 1MB fingerprints AND abseil's 2MB control bytes. The L2 vs L3 cache density advantage that drives our x86 results should not apply.

If the shuffled hit lookup gap at 1M 50% is:
- **> 1.3x**: The advantage is NOT just cache density. Something else is going on.
- **1.0-1.3x**: Advantage shrinks as predicted. Cache density was the main factor.
- **< 1.0x**: Abseil wins on M4. Our architecture only helps on small-L2 x86.

### Also check

- Does the size-dependent pattern hold? (Advantage at 1M but not 100K or 4M?)
- Is Rust+ahash still faster than abseil on M4?
- Does Go's performance change relative to the native-compiled implementations?

## Results

### Shuffled hit lookup (the key test)

| Load | Elastic (Zig) | Abseil (C++) | M4 ratio | x86 ratio |
|------|--------------|-------------|----------|-----------|
| 10% | 719 | 2,861 | **3.98x** | 1.97x |
| 25% | 2,276 | 10,169 | **4.47x** | 1.86x |
| 50% | 8,863 | 22,984 | **2.59x** | 1.74x |
| 75% | 15,972 | 33,624 | **2.11x** | 1.61x |
| 90% | 22,118 | 41,671 | **1.88x** | 1.50x |
| 99% | 25,748 | 46,543 | **1.81x** | 1.36x |

### Verdict

The prediction was wrong. The advantage is **not** cache-density-specific. At 50% load the gap went from 1.74x on x86 to 2.59x on M4 -- it grew by 49%.

The mechanism is cache lines touched per probe, not which cache level the data lives in. Separated, dense fingerprint arrays mean fewer cache line fetches under random access, and this holds regardless of L2 size.

### x86 reference (from Linux, AMD/Intel ~512KB L2)

| Load | Elastic | Abseil | Rust+ahash | Go swiss |
|------|---------|--------|-----------|---------|
| 50% | 11,119 | 19,312 | 16,235 | 25,304 |
| 99% | 33,318 | 45,404 | 36,292 | 57,488 |

Gap at 50%: elastic 1.74x faster than abseil, 1.46x faster than Rust+ahash.

## Troubleshooting

### abseil won't build on macOS

Try `brew install abseil` then check `pkg-config --libs absl_hash`. If pkg-config can't find it:
```bash
export PKG_CONFIG_PATH="/opt/homebrew/lib/pkgconfig:$PKG_CONFIG_PATH"
```

### Zig SIMD on ARM

Zig's `@Vector` operations compile to ARM NEON on aarch64. The SIMD fingerprint matching should work without changes, but the generated instructions differ from SSE2. If tests fail, there may be an alignment or endianness issue.

### Go swiss.Map crashes

If `swiss.Map` crashes with a segfault, ensure you're using pre-allocated strings (the current code on this branch already does this).
144 changes: 144 additions & 0 deletions FINDINGS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Elastic Hash: What We Actually Know

## What is it

A hash table based on the 2025 paper "Optimal Bounds for Open Addressing Without Reordering" (Farach-Colton, Krapivin, Kuszmaul). The paper's contribution is tiered arrays that prevent cluster merging, giving O(log^2) worst-case probes instead of O(n). Our implementation adds SIMD fingerprint matching, separated metadata layout, and a cold-hinted early termination optimization.

## The bottom line

Elastic hash is **1.7x faster on hit lookups** and **2.6x faster on inserts** than Google's abseil `flat_hash_map` (SwissTable). Verified in both C++ (same compiler as abseil) and Zig. Miss lookups are roughly tied with abseil at moderate load.

## What makes it faster (and what doesn't)

### What helps

**Separated dense fingerprint arrays.** Fingerprints (1 byte per slot) live in a contiguous array, separate from entry data (24 bytes per slot). One cache line covers 64 fingerprint slots. This means fewer cache line fetches per probe under random access. This is the single biggest contributor to the lookup advantage. Verified: the advantage persists across x86 and ARM (M4), ruling out cache-level effects.

**Simpler insert path.** Abseil's `emplace()` includes growth policy checks, rehash infrastructure, and hashtablez sampling overhead. Elastic hash finds an empty slot and writes. At normal loads, first probe usually succeeds. This is why inserts are 2.6x faster — less bookkeeping, not a better algorithm.

**Simpler delete.** Tombstone marking (one byte write) vs abseil's find-then-erase. Consistently 2-4x faster across all languages.

**Cold-hinted early termination for misses.** Adding `matchEmpty` after each probe with `@branchHint(.cold)` (Zig) or `__builtin_expect(..., 0)` (C++) lets miss lookups terminate early without hurting hit performance. The branch predictor learns to predict "not taken," making the check free on hits. This was tried and reverted 5 times before the cold hint was added — the hint is essential.

### What doesn't help

**The tiered layout itself.** A controlled C++ experiment (4 implementations: tiered vs flat x linear vs triangular probing, same compiler, same SIMD) showed that tiered and flat perform similarly when both use separated fingerprints. The tiered layout's contribution is worst-case guarantees at very high load, not average-case speed.

**Probing strategy.** Linear vs triangular probing produces identical performance within noise. The probing pattern doesn't matter for these workloads.

**The Zig compiler.** Zig adds ~5-10% over the C++ version on the same algorithm. The bulk of the advantage (1.7x) is present in C++ with g++.

## Performance by operation (1M elements, 50% load)

### C++ elastic hash vs abseil (same g++ compiler, unshuffled)

| Operation | Elastic C++ | Abseil | Ratio |
|---|---|---|---|
| Hit lookup | 5,066us | 8,700us | **1.72x faster** |
| Miss lookup | 2,731us | 2,899us | **1.06x faster** |
| Insert | ~3,900us | 10,288us | **2.64x faster** |

### Zig elastic hash vs abseil (unshuffled)

| Operation | Elastic Zig | Abseil | Ratio |
|---|---|---|---|
| Hit lookup | 4,736us | 8,700us | **1.84x faster** |
| Miss lookup | 2,529us | 2,899us | **1.15x faster** |
| Insert | 3,677us | 10,288us | **2.80x faster** |
| Delete | 1,528us | 5,749us | **3.76x faster** |

### Go elastic hash vs Go swiss.Map (shuffled)

| Operation | Elastic Go | swiss.Map | Ratio |
|---|---|---|---|
| Hit lookup | 52,600us | 31,600us | **1.66x slower** |
| Miss lookup | 49,150us | 24,200us | **2.03x slower** |
| Insert | 14,500us | 25,400us | **1.75x faster** |
| Delete | 9,900us | 13,200us | **1.33x faster** |

## Performance across table sizes (C++ elastic vs abseil, unshuffled, 50% load)

Tested on Apple M4 with same g++ compiler. Clean sequential runs.

| Size | Elastic C++ hit | Abseil hit | Hit ratio | Elastic miss | Abseil miss | Miss ratio |
|---|---|---|---|---|---|---|
| 16K | 26us | 30us | ~tied | 22us | 23us | ~tied |
| 64K | 110us | 164us | **1.5x faster** | 92us | 108us | **1.2x faster** |
| 256K | 493us | 2,150us | **4.4x faster** | 408us | 660us | **1.6x faster** |
| 1M | 5,684us | 9,570us | **1.7x faster** | 3,122us | 3,258us | ~tied |
| 4M | 37,817us | 49,888us | **1.3x faster** | 18,160us | 27,430us | **1.5x faster** |

Elastic hash is faster or tied at every size from 16K to 4M on hits. Peak advantage is 256K (4.4x).

## Performance across load factors (C++ elastic vs abseil, unshuffled, 1M)

| Load | Elastic hit | Abseil hit | Hit ratio | Elastic miss | Abseil miss | Miss ratio |
|---|---|---|---|---|---|---|
| 10% | 419us | 1,734us | **4.1x faster** | 315us | 394us | **1.3x faster** |
| 25% | 1,898us | 4,743us | **2.5x faster** | 993us | 1,454us | **1.5x faster** |
| 50% | 5,684us | 9,570us | **1.7x faster** | 3,122us | 3,258us | ~tied |
| 75% | 10,704us | 14,157us | **1.3x faster** | 9,397us | 5,817us | **abseil 1.6x faster** |
| 90% | 15,333us | 17,781us | **1.2x faster** | 25,355us | 7,979us | **abseil 3.2x faster** |
| 99% | 19,791us | 19,263us | ~tied | 36,306us | 9,319us | **abseil 3.9x faster** |

Elastic hash wins on hits at every load factor up to 90%. Miss advantage disappears above 50% because fewer empty slots means `matchEmpty` can't terminate early. At 75%+ abseil's miss performance is 1.6-3.9x faster.

## Cross-architecture results

| Platform | L2 Cache | Hit advantage at 50% |
|---|---|---|
| x86 (Linux, ~512KB L2) | ~512KB | ~1.7x |
| Apple M4 | ~16MB | ~1.7x |

The advantage is architecture-independent. It's not about fitting in a specific cache level — it's about cache lines per probe.

## Variable key lengths (Zig vs abseil, shuffled, M4)

| Key length | Hit advantage |
|---|---|
| 8 bytes | 1.3x |
| 16 bytes | ~tied (shuffled) to 1.7x (sequential) |
| 32 bytes | 1.4x |
| 64 bytes | 1.6x |
| 128 bytes | 2.0x |
| 256 bytes | 2.0x |

Advantage grows with key length because fingerprint pre-filtering skips more expensive key comparisons.

## Stability

**Tombstone churn:** 500K delete-insert cycles at 50% load. No degradation. Tombstones get recycled by subsequent inserts.

**Mixed workload** (40% hit, 40% miss, 10% insert, 10% delete): Zig elastic hash sustains 50M ops/sec vs abseil's 25M. The C++ elastic hash would be proportionally closer but still ahead due to insert/delete advantage.

**Memory:** Identical to abseil. 1.00x at every capacity tested.

## Growth policy overhead

Adding an abseil-style resize check (`count * 8 > capacity * 7`) to every insert adds 37% overhead at 50% load, even when resize never triggers. This is just the branch — not the resize itself. Abseil's insert path has this plus hashtablez sampling plus other bookkeeping. The simpler insert path accounts for a significant chunk of the 2.6x insert advantage.

With the growth policy, the production-ready version (`string_hybrid_growth.zig`) also handles:
- **Duplicate keys:** insert checks for existing key and updates value instead of creating a second entry
- **Automatic resize:** doubles capacity when load exceeds 87.5%, rehashes all elements

## What we got wrong along the way

1. **"Cache density — fingerprints fit in L2, abseil's don't."** Wrong. M4 with 16MB L2 showed the same advantage. It's cache lines per probe, not cache level.

2. **"The tiered layout prevents cluster merging, making it faster."** Partly true for worst-case theory, but the controlled experiment showed flat layout performs the same when given the same fingerprint design.

3. **"Zig's compiler makes it 2x faster."** Mostly wrong. The C++ port shows 1.7x — only 5-10% comes from Zig's compiler. The earlier "2x compiler advantage" claim was comparing Zig unshuffled numbers against C++ shuffled numbers (different access patterns).

4. **"Elastic hash is 3-5x faster on inserts because of the tiered architecture."** Misleading. The insert advantage comes from simpler code paths (no growth policy, no rehash checks), not from the tiered layout. Abseil's insert overhead is abseil-specific.

5. **"Flat tables with 2x capacity is a fair comparison."** Wrong. This halved the effective load factor, invalidating the controlled experiment. Caught and fixed.

6. **"Abseil is 3.8x faster at 16K."** Wrong. An earlier run with machine contention produced bad data. Clean sequential runs show they're tied at 16K.

## So what

**If you use C++, Zig, or Rust** and need a hash table for read-heavy or write-heavy workloads at moderate load factors (10-75%), this implementation is meaningfully faster than abseil. 1.7x on lookups and 2.6x on inserts is real and verified.

**The portable insight** is the cold-hinted early termination. Any SIMD hash table (including abseil forks) can add `matchEmpty` with a cold branch hint to improve miss performance without regressing hits. This is a one-line optimization that works in any language with branch prediction hints.

**The paper's contribution** is the theoretical worst-case guarantee, not average-case speed. The practical speed comes from implementation choices (separated fingerprints, simple operations) that could be applied to other hash table designs.
Loading