Skip to content

Conversation

@jamestexas
Copy link
Contributor

@jamestexas jamestexas commented Dec 16, 2025

Summary

Optimize database operations to significantly reduce execution time by eliminating per-record commits and adding query result caching.

Problem

Vunnel currently performs database commits after every single operation:

  1. Fix-date tracking commits after every insert (~322k commits for full NVD run)
  2. Result writes use separate transactions per CVE (~322k transactions)
  3. Fix-date lookups hit the database every time, even for duplicate queries

This creates excessive disk I/O, especially when using NFS. This ultimately requires extra memory to process larger providers (like NVD) as the uncompressed provider database is ~15gb.

Solution

1. Batched Commits for Fix-Date Tracking

  • Before: conn.commit() after every insert
  • After: Batch commits every 2000 operations (configurable)
  • Auto-flushes on context exit to ensure data integrity

2. Batched Commits for Result Writes

  • Before: Separate transaction per CVE write
  • After: Batch commits every 2000 operations (configurable)
  • Maintains single active transaction, commits periodically

3. LRU Caching for Fix-Date Lookups

  • Before: Every fixdater.best() call hit database
  • After: functools.lru_cache(maxsize=10000) caches results
  • Eliminates duplicate queries for same (CVE, CPE, version, ecosystem)
  • Cache cleared on context exit

Performance Impact

Tested on NVD provider with 322k CVEs:

  • Improvement: 35% faster (~6 minute savings on my mac).
  • Database commits: Reduced from ~322k to ~161.

Testing

  • All existing tests pass
  • Added test for batched commit behavior
  • Added test for cache hit functionality

Fixes: https://linear.app/chainguard/issue/PLA-368/optimize-vunnel-database-operations-with-batching-and-caching

Memory Safety

Batch size of 2000 ensures memory usage remains bounded while still providing significant performance gains.

Optimize database operations to reduce execution time by eliminating
per-record commits and adding query result caching.

Changes:

1. Batched commits for fix-date tracking (vunnel_first_observed.py)
   - Previously: commit after every single insert
   - Now: batch commits every 2000 operations (configurable)
   - Auto-flushes on context exit to ensure data integrity

2. Batched commits for result writes (result.py)
   - Previously: separate transaction per CVE write
   - Now: batch commits every 2000 operations (configurable)
   - Maintains single active transaction, commits periodically

3. LRU caching for fix-date lookups (finder.py)
   - Previously: every fixdater.best() call hit database
   - Now: functools.lru_cache(maxsize=10000) caches results
   - Eliminates duplicate queries for same (CVE, CPE, version, ecosystem)
   - Cache cleared on context exit

Performance impact:
- Tested on NVD provider with 322k CVEs
- Reduces execution time from 17 minutes to 11 minutes (35% faster)
- Reduces database commits from ~322k to ~161
- Eliminates duplicate database queries through caching

Testing:
- All existing tests pass
- Added tests for batching behavior
- Added tests for cache functionality

Signed-off-by: James Gardner <james.gardner@chainguard.dev>
@jamestexas jamestexas force-pushed the perf-batch-commits-and-caching branch from a307dd7 to 620a9a0 Compare December 16, 2025 23:48
Signed-off-by: James Gardner <james.gardner@chainguard.dev>
Add threading.Lock() to SQLiteStore to ensure thread-safe access to
transaction state. While current provider implementations call
writer.write() sequentially from the main thread, Ubuntu and RHEL
providers use ThreadPoolExecutor internally. Adding defensive locking
prevents potential data corruption if providers are refactored to
parallelize writes in the future.

Also removes duplicate test definition in test_finder.py.

Signed-off-by: James Gardner <james.gardner@chainguard.dev>
@jamestexas jamestexas marked this pull request as ready for review December 18, 2025 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant