Skip to content

Concurrent embedding batch processing #19

@johnlanda

Description

@johnlanda

Summary

Embedding providers process batches sequentially — each batch API call must complete before the next is submitted. Submit multiple batches concurrently to reduce total embedding time by 2-5x, especially on high-latency API connections.

Context

All three embedders (OpenAI, Voyage, Ollama) use the same pattern: split texts into batches, then loop sequentially (for start := 0; start < len(texts); start += batchSize). Each iteration makes one HTTP POST and waits for the response before starting the next. For large chunk sets (e.g., 10,000 chunks with OpenAI batch size 2048 = 5 batches), this serializes 5 network round-trips.

Key files:

  • internal/embedder/openai.goEmbed() method, batch loop (line ~58-134), batch size 2048
  • internal/embedder/voyage.goEmbed() method, batch size 128
  • internal/embedder/ollama.goEmbed() method, batch size 64

Acceptance Criteria

  • Embedding batches are submitted concurrently with bounded concurrency (e.g., max 3-5 in-flight)
  • Results are reassembled in correct order (matching input text indices)
  • Retry logic still works correctly per-batch (existing doWithRetry behavior preserved)
  • Rate limit handling (429 + Retry-After) is respected and doesn't cause thundering herd
  • go test -race ./internal/embedder/... passes
  • Measurable latency reduction when embedding 1000+ chunks

Technical Approach

  1. In each embedder's Embed() method, replace the sequential batch loop with errgroup.Group with SetLimit(maxConcurrentBatches)
  2. Each goroutine processes one batch, stores result at the correct index in a pre-allocated results slice
  3. Keep doWithRetry per-batch — concurrent batches retry independently
  4. Add a small jitter to concurrent batch starts to avoid API rate limit spikes
  5. Consider making maxConcurrentBatches configurable per provider (Ollama local = higher, OpenAI = moderate)

Dependencies

None — standalone improvement to the embedder package.

Out of Scope

  • Token-aware batch sizing (follow-up optimization)
  • Changing batch size defaults

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions