perf(semantic): run batch overview generation and file summaries concurrently#840
Open
ahmedhesham6 wants to merge 2 commits intovolcengine:mainfrom
Open
perf(semantic): run batch overview generation and file summaries concurrently#840ahmedhesham6 wants to merge 2 commits intovolcengine:mainfrom
ahmedhesham6 wants to merge 2 commits intovolcengine:mainfrom
Conversation
…urrently The semantic processor generates directory overviews by splitting large directories into batches of 50 files and calling the VLM for each batch. Previously, both file summary generation in _process_memory_directory and batch overview generation in _batched_generate_overview ran sequentially, causing directories with 1000+ files to take 15+ minutes as each VLM call blocked the next. This change runs both operations concurrently using asyncio.gather, bounded by the existing max_concurrent_llm semaphore: - _process_memory_directory: changed files now generate summaries in parallel instead of awaiting each one sequentially. Cached summaries are still reused for unchanged files. - _batched_generate_overview: all batch prompts are pre-built, then dispatched concurrently via asyncio.gather with the llm semaphore controlling concurrency. Batch ordering is preserved via indexed list. With max_concurrent_llm=20, a 1000-file directory that previously took ~15 minutes for the batch step now completes in ~23 seconds (~40x improvement). The final merge step remains sequential as it depends on all batches completing.
|
Failed to generate code suggestions for PR |
Bortlesboat
reviewed
Mar 22, 2026
…ormatting Thread llm_sem through _generate_overview and _batched_generate_overview so callers can share a single semaphore across the full pipeline, preventing concurrent calls from exceeding the intended concurrency limit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The semantic processor generates directory overviews by splitting large directories into batches of 50 files and calling the VLM for each batch. Previously, both file summary generation in
_process_memory_directoryand batch overview generation in_batched_generate_overviewran sequentially — each VLM call blocked the next. For directories with 1000+ files (common in memory directories likeentities,events,preferences), this caused a single queue item to take 15+ minutes, blocking the entire semantic queue.This change runs both operations concurrently using
asyncio.gather, bounded by the existingmax_concurrent_llmsemaphore.Related Issue
N/A — discovered during production usage with large memory directories (1000+ entity memories).
Type of Change
Changes Made
_process_memory_directory: Changed file summary generation for modified/added files to run concurrently viaasyncio.gatherinstead of sequentialawaitin a loop. Cached summaries for unchanged files are still reused. Order is preserved via a pre-allocated indexed list._batched_generate_overview: All batch prompts are pre-built in the existing loop, then dispatched concurrently viaasyncio.gather. Each VLM call is bounded byasync with llm_semto respectmax_concurrent_llm. Batch ordering is preserved via an indexed list. The final merge step remains sequential as it depends on all batches completing.Testing
max_concurrent_llm=20against a directory with 1,214 memory files split into 20 batchesBefore (sequential):
memories/entities(1,000 files, 20 batches) — ~15 minutes for batch overview step aloneAfter (concurrent): Same directory — ~23 seconds for batch overview step (~40x improvement)
memories/entitiesmemories/casesmemories/patternsChecklist
Additional Notes
The semantic queue processes items sequentially (one at a time). When a single memory directory with 1000+ files enters the queue, it blocks all other items for the duration of its processing. This change does not alter that single-consumer behavior — it only parallelizes the VLM calls within a single queue item.
The
max_concurrent_llmsemaphore (configured viavlm.max_concurrentinov.conf) controls the degree of parallelism. The default of 100 is appropriate for most VLM providers. The change is fully backward-compatible — withmax_concurrent_llm=1the behavior is identical to sequential execution.