Skip to content

Feature request: --batch-size / --limit flag for embed command #215

@daronyondem

Description

@daronyondem

Problem

When running qmd embed on a CPU-only machine (no GPU, AVX512 only), the process crashes with a SessionReleasedError after processing ~190 chunks. The GGUF embedding model runs out of memory on large collections.

Current behavior

qmd embed attempts to embed all un-embedded chunks in a single run. There is no way to limit how many chunks are processed per invocation.

Proposed solution

Add a --batch-size <n> or --limit <n> flag to qmd embed that caps the number of chunks processed per run. This would allow:

  • CPU-only users to run embed in multiple passes without OOM crashes
  • Easy integration with cron jobs that retry automatically
  • Graceful degradation on resource-constrained machines

Workaround

Currently running qmd embed multiple times in a retry loop, since it skips already-embedded chunks. Works but inelegant.

Environment

  • Windows 11, CPU-only (no GPU), AVX512
  • Collection: ~175 files, ~456 vectors
  • Installed via Bun

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions