Skip to content

Further investigations for performance optimizations #2

@jmwoliver

Description

@jmwoliver

Digging through the code, some spots to investigate more:


1. SIMD Binary Detection (High Impact, Low Effort)
File: simd.zig, parallel_walker.zig Problem (parallel_walker.zig:446-450):

for (data[0..check_len]) |byte| {
if (byte == 0) return; // Byte-by-byte for 8KB!
}
Solution: Add containsNul() using SIMD vectors to check 16-32 bytes at a time. Expected gain: 10-15%

2. Eliminate Path Allocations (High Impact, Medium Effort)
File: parallel_walker.zig Problem (line 393):

const full_path = std.fs.path.join(alloc, &.{ work.path, entry.name }) catch continue;
Every file/directory entry causes a heap allocation. Solution: Use stack-allocated path buffer with FixedBufferAllocator:

var path_buf: [std.fs.max_path_bytes]u8 = undefined;
var fba = std.heap.FixedBufferAllocator.init(&path_buf);
// Use fba.allocator() for path joins, reset between iterations
Expected gain: 10-15%

3. Gitignore Pattern Caching (High Impact, Medium Effort)
Files: gitignore.zig, parallel_walker.zig Problem (parallel_walker.zig:374-376, 235-299): loadParentGitignores() walks up the directory tree and re-parses .gitignore files for EVERY directory processed. Solution: Create a thread-safe GitignoreCache:
Cache parsed gitignore patterns per directory
Store parent inheritance chain
Mutex-protected lookup with lock-free reads for cached entries

pub const GitignoreCache = struct {
mutex: std.Thread.Mutex,
cache: std.StringHashMapUnmanaged(CachedIgnoreState),

pub fn getOrCreate(self: *GitignoreCache, dir_path: []const u8) !*CachedIgnoreState {
    // Fast path: check cache
    // Slow path: lock, load parents, parse .gitignore, cache
}

};
Expected gain: 15-20%

4. Better Initial Work Distribution (Medium Impact, Low Effort)
File: parallel_walker.zig Problem (lines 136-154): Single root path means only thread 0 gets initial work; others must steal. Solution: Pre-scan root directory to distribute subdirectories across all threads:

if (path_idx == 1 and self.num_threads > 1) {
// Open root dir, distribute subdirs to all thread deques round-robin
}
Expected gain: 5-10%

5. SIMD Case-Insensitive Search (Medium Impact, High Effort)
Files: simd.zig, matcher.zig Problem (matcher.zig:128-147): std.ascii.toLower() called per byte. Solution: SIMD case-folding using bit manipulation:

// ASCII: uppercase = lowercase ^ 0x20 (for letters only)
const case_bit: Vec = @Splat(0x20);
// Check if in A-Z range, then OR with case_bit
Expected gain: 10-20% for -i searches (not applicable to current benchmark)

Step Optimization Files Est. Gain
1 SIMD binary detection simd.zig, parallel_walker.zig 10-15%
2 Stack-allocated path buffers parallel_walker.zig 10-15%
3 Gitignore caching gitignore.zig, parallel_walker.zig 15-20%
4 Initial work distribution parallel_walker.zig 5-10%
5 SIMD case-insensitive search simd.zig, matcher.zig 10-20%

Conservative total estimate: 35-45% improvement Note: Cross-platform implementation only (no Linux-specific syscall optimizations)

File Changes
src/simd.zig Add containsNul(), optionally findSubstringIgnoreCase()
src/parallel_walker.zig Path buffer optimization, gitignore cache integration, work distribution, use SIMD binary detection
src/gitignore.zig Add GitignoreCache struct
src/matcher.zig Integrate SIMD case-insensitive (if implementing)

Another investigation found similar things but a few others:


Gitignore Pattern Caching (Medium Impact)
Problem: loadParentGitignores() in parallel_walker.zig:235-299 walks up the directory tree for every work item. Fix:
Use a thread-safe cache keyed by directory path
Only load each .gitignore once

Optimize ** Pattern Matching (Medium Impact)
Problem: globMatch() in gitignore.zig:72-162 uses recursion for ** patterns (line 92). Fix:
Convert to iterative approach
Cache compiled pattern state

Reduce Memory Allocation (Low Impact)
Problem: _platform_memmove (2.7%) and allocation overhead visible in trace. Fix:
Pre-allocate FileBuffer with reasonable initial capacity
Reuse buffers across files

Priority File Changes
P0 src/parallel_walker.zig Add per-worker output buffers
P0 src/output.zig Add bulk flush API
P1 src/gitignore.zig Pattern caching, iterative **
P2 src/reader.zig Buffer reuse

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions