Summary
Embedding providers process batches sequentially — each batch API call must complete before the next is submitted. Submit multiple batches concurrently to reduce total embedding time by 2-5x, especially on high-latency API connections.
Context
All three embedders (OpenAI, Voyage, Ollama) use the same pattern: split texts into batches, then loop sequentially (for start := 0; start < len(texts); start += batchSize). Each iteration makes one HTTP POST and waits for the response before starting the next. For large chunk sets (e.g., 10,000 chunks with OpenAI batch size 2048 = 5 batches), this serializes 5 network round-trips.
Key files:
internal/embedder/openai.go — Embed() method, batch loop (line ~58-134), batch size 2048
internal/embedder/voyage.go — Embed() method, batch size 128
internal/embedder/ollama.go — Embed() method, batch size 64
Acceptance Criteria
Technical Approach
- In each embedder's
Embed() method, replace the sequential batch loop with errgroup.Group with SetLimit(maxConcurrentBatches)
- Each goroutine processes one batch, stores result at the correct index in a pre-allocated results slice
- Keep
doWithRetry per-batch — concurrent batches retry independently
- Add a small jitter to concurrent batch starts to avoid API rate limit spikes
- Consider making
maxConcurrentBatches configurable per provider (Ollama local = higher, OpenAI = moderate)
Dependencies
None — standalone improvement to the embedder package.
Out of Scope
- Token-aware batch sizing (follow-up optimization)
- Changing batch size defaults
Summary
Embedding providers process batches sequentially — each batch API call must complete before the next is submitted. Submit multiple batches concurrently to reduce total embedding time by 2-5x, especially on high-latency API connections.
Context
All three embedders (OpenAI, Voyage, Ollama) use the same pattern: split texts into batches, then loop sequentially (
for start := 0; start < len(texts); start += batchSize). Each iteration makes one HTTP POST and waits for the response before starting the next. For large chunk sets (e.g., 10,000 chunks with OpenAI batch size 2048 = 5 batches), this serializes 5 network round-trips.Key files:
internal/embedder/openai.go—Embed()method, batch loop (line ~58-134), batch size 2048internal/embedder/voyage.go—Embed()method, batch size 128internal/embedder/ollama.go—Embed()method, batch size 64Acceptance Criteria
doWithRetrybehavior preserved)Retry-After) is respected and doesn't cause thundering herdgo test -race ./internal/embedder/...passesTechnical Approach
Embed()method, replace the sequential batch loop witherrgroup.GroupwithSetLimit(maxConcurrentBatches)doWithRetryper-batch — concurrent batches retry independentlymaxConcurrentBatchesconfigurable per provider (Ollama local = higher, OpenAI = moderate)Dependencies
None — standalone improvement to the embedder package.
Out of Scope