Bound _query_chunk_search_fallback result set with LIMIT#39
Bound _query_chunk_search_fallback result set with LIMIT#39duck-lint merged 4 commits into20260323_implementation-1_lockstepfrom
_query_chunk_search_fallback result set with LIMIT#39Conversation
Co-authored-by: duck-lint <250741170+duck-lint@users.noreply.github.com> Agent-Logs-Url: https://github.com/duck-lint/local-agent/sessions/bed97c53-aa6e-4946-87d6-4562d2cf0d13
_query_chunk_search_fallback result set with LIMIT
There was a problem hiding this comment.
Pull request overview
This PR bounds the projection-substring (“fallback”) lexical retrieval path in agent/corpus_db.py so that when FTS5 is unavailable, the system no longer risks materializing an unbounded chunk_search result set into Python before ranking.
Changes:
- Pass a computed
fetch_limitinto_query_chunk_search_fallback()from both fallback call sites. - Add a required
limitkwarg to_query_chunk_search_fallback()and applyLIMIT ?(sanitized viamax(1, int(limit))) to the SQL query.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| candidates = _query_chunk_search_fallback(conn, query_text=raw_query, limit=fetch_limit) | ||
| else: | ||
| actual_backend = "projection_substring" | ||
| candidates = _query_chunk_search_fallback(conn, query_text=raw_query) | ||
| candidates = _query_chunk_search_fallback(conn, query_text=raw_query, limit=fetch_limit) |
There was a problem hiding this comment.
The fallback path is now bounded via limit=fetch_limit, but there’s no regression test asserting that the projection-substring backend doesn’t return an unbounded candidate set when FTS5 is unavailable. Adding a focused test (e.g., patching _ensure_chunk_search_fts to false and inserting many chunk_search rows, then asserting the candidate fetch is capped) would prevent accidental reintroduction of the unbounded query.
There was a problem hiding this comment.
Done in commit 0c8fc41. Added FallbackLimitRegressionTests to tests/test_retrieval_contract.py with three focused tests that exercise _query_chunk_search_fallback directly against an in-memory SQLite DB: (1) result count is capped at the given limit when 200 matching rows exist, (2) all rows are returned when total matches fall below the limit, and (3) every returned row includes a backend_score float.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: duck-lint <250741170+duck-lint@users.noreply.github.com>
…orcement Co-authored-by: duck-lint <250741170+duck-lint@users.noreply.github.com> Agent-Logs-Url: https://github.com/duck-lint/local-agent/sessions/ad4ff7de-5ca2-4fd0-9a25-44ff1603d5f3
ce09819
into
20260323_implementation-1_lockstep
The projection-substring fallback path issued an unbounded
SELECToverchunk_search, meaning short or common queries could materialize the entire table into Python before ranking — a latency and memory hazard whenever FTS5 is unavailable.Changes
agent/corpus_db.pylimit: intkeyword parameter to_query_chunk_search_fallback()LIMIT ?(guarded withmax(1, int(limit))) to the fallback SQL queryfetch_limit(max(50, safe_limit * 10)), matching the cap already applied on the FTS5 pathtests/test_retrieval_contract.pyFallbackLimitRegressionTestsclass with three focused tests exercising_query_chunk_search_fallbackdirectly against an in-memory SQLite DB:backend_scorefloat🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.