Performance optimizations by chris-peterson · Pull Request #51 · chris-peterson/spiffy

chris-peterson · 2026-03-22T07:08:36Z

Problem

EventContext allocates heavily per event — 12,450 bytes/op in
v6.4.7 — driven by three ConcurrentDictionaries, a Stopwatch
object, LINQ-based sorting in Render(), and multiple dictionary
iteration passes for key normalization. At high event volumes
this creates significant GC pressure.

Approach

Targeted the allocation and compute hot paths while preserving
the existing thread-safety contract:

Storage: Replaced ConcurrentDictionary with flat
string[]/object[] arrays guarded by lock. Linear scan
for key lookup is faster than hash tables at typical field
counts (<20). Insertion order is preserved naturally,
eliminating the sort in Render().
Timer: Replaced per-event Stopwatch allocation with
static Stopwatch.GetTimestamp() arithmetic.
Lazy allocation: TimerCollection, _counts, and
PrivateData are only allocated when actually used.
Lock reduction: Constructor writes bypass locking since
the object isn't shared yet. Internal SetCore path avoids
double-locking in Render.
Render path: Single-pass key normalization (merged three
separate dictionary scans), ThreadStatic StringBuilder
reuse, cached quote-char strings.

Trade-off: FindKey is O(n) vs O(1) hash lookup. For the
typical 10-20 fields per event, the cache-friendly linear scan
is net faster because it avoids Dictionary's allocation and
hash overhead.

Results (.NET 10, 60s sustained)

Metric	v6.4.7	This PR
Throughput	179K ops/sec	1,721K ops/sec (9.6x)
Latency	5.60 µs/op	0.58 µs/op
Bytes/op	12,450	1,272 (-90%)
GC gen1	469	105 (-78%)

Review guide

Start with EventContext.cs — the core storage change from
ConcurrentDictionary to flat arrays, SetCore/FindKey/
AppendEntry, and the restructured Render(). Then
TimerCollection.cs (same ConcurrentDict removal pattern),
AutoTimer.cs (Stopwatch elimination), and
StringExtensions.cs (quote-char caching). The
Benchmarks.Baseline/ project is a standalone harness that
references the 6.4.7 NuGet package for comparison.

Testing

All 56 existing unit tests pass. Throughput validated via
60-second sustained benchmark runs comparing against the
published 6.4.7 NuGet package baseline.

Replace ConcurrentDictionary with lock-guarded flat arrays for EventContext field storage, preserving thread safety while eliminating hash table overhead and sort-on-render. Stopwatch replaced with Stopwatch.GetTimestamp() to avoid per-event object allocation. Lazy initialization for timers, counts, and PrivateData skips allocation when unused. Constructor batches field writes without locking since the object isn't yet shared. ThreadStatic StringBuilder reuse and single-pass key normalization reduce render allocations. Throughput: 179K -> 1,721K ops/sec (9.6x vs 6.4.7 release). Allocation: 12,450 -> 1,272 bytes/op (89.8% reduction). Adds benchmark harness and Benchmarks.Baseline project for NuGet package comparison.

chris-peterson force-pushed the claude/add-performance-harness-bx8YU branch 2 times, most recently from 84290eb to 197262a Compare March 23, 2026 02:04

chris-peterson force-pushed the claude/add-performance-harness-bx8YU branch from 197262a to 2dd66ba Compare March 23, 2026 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance optimizations#51

Performance optimizations#51
chris-peterson wants to merge 1 commit intomainfrom
claude/add-performance-harness-bx8YU

chris-peterson commented Mar 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chris-peterson commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Approach

Results (.NET 10, 60s sustained)

Review guide

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chris-peterson commented Mar 22, 2026 •

edited

Loading