Skip to content

Bound _query_chunk_search_fallback result set with LIMIT#39

Merged
duck-lint merged 4 commits into20260323_implementation-1_lockstepfrom
copilot/sub-pr-38
Mar 23, 2026
Merged

Bound _query_chunk_search_fallback result set with LIMIT#39
duck-lint merged 4 commits into20260323_implementation-1_lockstepfrom
copilot/sub-pr-38

Conversation

Copy link
Contributor

Copilot AI commented Mar 23, 2026

The projection-substring fallback path issued an unbounded SELECT over chunk_search, meaning short or common queries could materialize the entire table into Python before ranking — a latency and memory hazard whenever FTS5 is unavailable.

Changes

  • agent/corpus_db.py
    • Added limit: int keyword parameter to _query_chunk_search_fallback()
    • Appended LIMIT ? (guarded with max(1, int(limit))) to the fallback SQL query
    • Both call sites now pass fetch_limit (max(50, safe_limit * 10)), matching the cap already applied on the FTS5 path
# Before
candidates = _query_chunk_search_fallback(conn, query_text=raw_query)

# After
candidates = _query_chunk_search_fallback(conn, query_text=raw_query, limit=fetch_limit)
  • tests/test_retrieval_contract.py
    • Added FallbackLimitRegressionTests class with three focused tests exercising _query_chunk_search_fallback directly against an in-memory SQLite DB:
      • Asserts result count is capped at the given limit when many more rows match (prevents reintroduction of the unbounded query)
      • Asserts all rows are returned when total matches fall below the limit
      • Asserts every returned row includes a backend_score float

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Copilot AI changed the title [WIP] [WIP] Address feedback from PR #38 for implementation 1 adjustments Bound _query_chunk_search_fallback result set with LIMIT Mar 23, 2026
Copilot AI requested a review from duck-lint March 23, 2026 19:55
@duck-lint duck-lint marked this pull request as ready for review March 23, 2026 19:57
Copilot AI review requested due to automatic review settings March 23, 2026 19:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bounds the projection-substring (“fallback”) lexical retrieval path in agent/corpus_db.py so that when FTS5 is unavailable, the system no longer risks materializing an unbounded chunk_search result set into Python before ranking.

Changes:

  • Pass a computed fetch_limit into _query_chunk_search_fallback() from both fallback call sites.
  • Add a required limit kwarg to _query_chunk_search_fallback() and apply LIMIT ? (sanitized via max(1, int(limit))) to the SQL query.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +463 to +466
candidates = _query_chunk_search_fallback(conn, query_text=raw_query, limit=fetch_limit)
else:
actual_backend = "projection_substring"
candidates = _query_chunk_search_fallback(conn, query_text=raw_query)
candidates = _query_chunk_search_fallback(conn, query_text=raw_query, limit=fetch_limit)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback path is now bounded via limit=fetch_limit, but there’s no regression test asserting that the projection-substring backend doesn’t return an unbounded candidate set when FTS5 is unavailable. Adding a focused test (e.g., patching _ensure_chunk_search_fts to false and inserting many chunk_search rows, then asserting the candidate fetch is capped) would prevent accidental reintroduction of the unbounded query.

Copilot uses AI. Check for mistakes.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 0c8fc41. Added FallbackLimitRegressionTests to tests/test_retrieval_contract.py with three focused tests that exercise _query_chunk_search_fallback directly against an in-memory SQLite DB: (1) result count is capped at the given limit when 200 matching rows exist, (2) all rows are returned when total matches fall below the limit, and (3) every returned row includes a backend_score float.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: duck-lint <250741170+duck-lint@users.noreply.github.com>
@duck-lint duck-lint merged commit ce09819 into 20260323_implementation-1_lockstep Mar 23, 2026
4 checks passed
@duck-lint duck-lint deleted the copilot/sub-pr-38 branch March 23, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants