Summary
The mctl up sync pipeline processes dependencies strictly sequentially — each dependency must complete its full fetch→chunk→embed→upsert cycle before the next begins. Parallelize dependency processing with a bounded worker pool to achieve 3-10x speedup on multi-dependency projects.
Context
The main sync loop in internal/pipeline/pipeline.go (line ~168) iterates dependencies with a plain for loop. Each dependency goes through: artifact check → git clone → file extraction → chunking → embedding API call → LanceDB upsert. These are independent per-dependency and safe to parallelize.
Key files:
internal/pipeline/pipeline.go — Sync() function, main for _, dep := range m.Dependencies loop (~line 168)
cmd/up.go — CLI entry point that calls pipeline.Sync() and aggregates results
Acceptance Criteria
Technical Approach
- Add a
concurrency option to the pipeline config (default: runtime.GOMAXPROCS(0))
- Replace the sequential loop with
errgroup.Group with SetLimit(concurrency) or a semaphore-based worker pool
- Collect
SyncResult values via a thread-safe slice or channel
- Ensure the embedder and store are safe for concurrent use (LanceDB connection is thread-safe; embedders are stateless per-call)
- Write lockfile only after all goroutines complete
Dependencies
None — standalone improvement.
Out of Scope
- Parallel chunking within a single dependency (separate issue)
- Concurrent embedding batches within a single dependency (separate issue)
- Progress bar / per-dependency status reporting
Summary
The
mctl upsync pipeline processes dependencies strictly sequentially — each dependency must complete its full fetch→chunk→embed→upsert cycle before the next begins. Parallelize dependency processing with a bounded worker pool to achieve 3-10x speedup on multi-dependency projects.Context
The main sync loop in
internal/pipeline/pipeline.go(line ~168) iterates dependencies with a plainforloop. Each dependency goes through: artifact check → git clone → file extraction → chunking → embedding API call → LanceDB upsert. These are independent per-dependency and safe to parallelize.Key files:
internal/pipeline/pipeline.go—Sync()function, mainfor _, dep := range m.Dependenciesloop (~line 168)cmd/up.go— CLI entry point that callspipeline.Sync()and aggregates resultsAcceptance Criteria
min(len(deps), GOMAXPROCS)or configurable)go test ./internal/pipeline/...passes with race detector (-race)time mctl up)Technical Approach
concurrencyoption to the pipeline config (default:runtime.GOMAXPROCS(0))errgroup.GroupwithSetLimit(concurrency)or a semaphore-based worker poolSyncResultvalues via a thread-safe slice or channelDependencies
None — standalone improvement.
Out of Scope