perf: Add batched commits and LRU caching for database operations #957
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Optimize database operations to significantly reduce execution time by eliminating per-record commits and adding query result caching.
Problem
Vunnel currently performs database commits after every single operation:
This creates excessive disk I/O, especially when using NFS. This ultimately requires extra memory to process larger providers (like NVD) as the uncompressed provider database is ~15gb.
Solution
1. Batched Commits for Fix-Date Tracking
conn.commit()after every insert2. Batched Commits for Result Writes
3. LRU Caching for Fix-Date Lookups
fixdater.best()call hit databasefunctools.lru_cache(maxsize=10000)caches resultsPerformance Impact
Tested on NVD provider with 322k CVEs:
Testing
Fixes: https://linear.app/chainguard/issue/PLA-368/optimize-vunnel-database-operations-with-batching-and-caching
Memory Safety
Batch size of 2000 ensures memory usage remains bounded while still providing significant performance gains.