Skip to content

feat(query): add cursor-based pagination and optional JSONL streaming for query results #132

@mfittko

Description

@mfittko

Problem

topK currently acts as both retrieval budget and response size cap. In hybrid mode, this can truncate useful results and does not provide a stable way to page through result sets.

Proposal

Introduce explicit pagination + capped responses for query endpoints and CLI:

  1. Cursor-based pagination (primary)

    • API request supports limit and cursor.
    • API response includes nextCursor and paging metadata.
    • Stable ordering contract for pagination (score DESC, tie-break by deterministic id).
  2. Result cap controls

    • Add server-side hard cap (maxResults) to bound runtime/memory.
    • Keep internal retrieval budget separate from client-facing page size.
  3. CLI support

    • Add --limit and --cursor in raged query.
    • Keep --topK for backward compatibility (deprecate in help text once limit is available).
  4. Optional JSONL mode

    • Add --stream jsonl (or equivalent) for pipeline-friendly output.
    • Stream event lines (meta, result, done) with deterministic result order.

Why cursor over page/offset

Offset-based pagination is vulnerable to duplicates/misses when rankings shift due to ingestion/re-embedding/reranking, especially in hybrid retrieval. Cursor/keyset pagination is more stable and more efficient.

Acceptance Criteria

  • API query supports limit + cursor and returns nextCursor.
  • Pagination is deterministic for the same index state and request.
  • Server enforces a configurable hard cap for returned/processed results.
  • CLI supports --limit + --cursor.
  • CLI keeps --topK compatibility (documented behavior).
  • Optional JSONL output mode is available and documented.
  • Tests cover paging behavior, cursor correctness, and JSONL output format.
  • Docs updated (docs/03-cli.md, API reference).

Non-goals

  • Reworking ranking algorithm quality/scoring weights in this issue.
  • Implementing offset/page-number pagination as primary backend protocol.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions