-
Notifications
You must be signed in to change notification settings - Fork 524
Open
Description
Problem
When running qmd embed on a CPU-only machine (no GPU, AVX512 only), the process crashes with a SessionReleasedError after processing ~190 chunks. The GGUF embedding model runs out of memory on large collections.
Current behavior
qmd embed attempts to embed all un-embedded chunks in a single run. There is no way to limit how many chunks are processed per invocation.
Proposed solution
Add a --batch-size <n> or --limit <n> flag to qmd embed that caps the number of chunks processed per run. This would allow:
- CPU-only users to run embed in multiple passes without OOM crashes
- Easy integration with cron jobs that retry automatically
- Graceful degradation on resource-constrained machines
Workaround
Currently running qmd embed multiple times in a retry loop, since it skips already-embedded chunks. Works but inelegant.
Environment
- Windows 11, CPU-only (no GPU), AVX512
- Collection: ~175 files, ~456 vectors
- Installed via Bun
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels