Skip to content

Performance optimizations#51

Draft
chris-peterson wants to merge 1 commit intomainfrom
claude/add-performance-harness-bx8YU
Draft

Performance optimizations#51
chris-peterson wants to merge 1 commit intomainfrom
claude/add-performance-harness-bx8YU

Conversation

@chris-peterson
Copy link
Copy Markdown
Owner

@chris-peterson chris-peterson commented Mar 22, 2026

Problem

EventContext allocates heavily per event — 12,450 bytes/op in
v6.4.7 — driven by three ConcurrentDictionaries, a Stopwatch
object, LINQ-based sorting in Render(), and multiple dictionary
iteration passes for key normalization. At high event volumes
this creates significant GC pressure.

Approach

Targeted the allocation and compute hot paths while preserving
the existing thread-safety contract:

  • Storage: Replaced ConcurrentDictionary with flat
    string[]/object[] arrays guarded by lock. Linear scan
    for key lookup is faster than hash tables at typical field
    counts (<20). Insertion order is preserved naturally,
    eliminating the sort in Render().
  • Timer: Replaced per-event Stopwatch allocation with
    static Stopwatch.GetTimestamp() arithmetic.
  • Lazy allocation: TimerCollection, _counts, and
    PrivateData are only allocated when actually used.
  • Lock reduction: Constructor writes bypass locking since
    the object isn't shared yet. Internal SetCore path avoids
    double-locking in Render.
  • Render path: Single-pass key normalization (merged three
    separate dictionary scans), ThreadStatic StringBuilder
    reuse, cached quote-char strings.

Trade-off: FindKey is O(n) vs O(1) hash lookup. For the
typical 10-20 fields per event, the cache-friendly linear scan
is net faster because it avoids Dictionary's allocation and
hash overhead.

Results (.NET 10, 60s sustained)

Metric v6.4.7 This PR
Throughput 179K ops/sec 1,721K ops/sec (9.6x)
Latency 5.60 µs/op 0.58 µs/op
Bytes/op 12,450 1,272 (-90%)
GC gen1 469 105 (-78%)

Review guide

Start with EventContext.cs — the core storage change from
ConcurrentDictionary to flat arrays, SetCore/FindKey/
AppendEntry, and the restructured Render(). Then
TimerCollection.cs (same ConcurrentDict removal pattern),
AutoTimer.cs (Stopwatch elimination), and
StringExtensions.cs (quote-char caching). The
Benchmarks.Baseline/ project is a standalone harness that
references the 6.4.7 NuGet package for comparison.

Testing

All 56 existing unit tests pass. Throughput validated via
60-second sustained benchmark runs comparing against the
published 6.4.7 NuGet package baseline.

@chris-peterson chris-peterson force-pushed the claude/add-performance-harness-bx8YU branch 2 times, most recently from 84290eb to 197262a Compare March 23, 2026 02:04
Replace ConcurrentDictionary with lock-guarded flat arrays
for EventContext field storage, preserving thread safety
while eliminating hash table overhead and sort-on-render.
Stopwatch replaced with Stopwatch.GetTimestamp() to avoid
per-event object allocation. Lazy initialization for timers,
counts, and PrivateData skips allocation when unused.
Constructor batches field writes without locking since the
object isn't yet shared. ThreadStatic StringBuilder reuse
and single-pass key normalization reduce render allocations.

Throughput: 179K -> 1,721K ops/sec (9.6x vs 6.4.7 release).
Allocation: 12,450 -> 1,272 bytes/op (89.8% reduction).
Adds benchmark harness and Benchmarks.Baseline project for
NuGet package comparison.
@chris-peterson chris-peterson force-pushed the claude/add-performance-harness-bx8YU branch from 197262a to 2dd66ba Compare March 23, 2026 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant