-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Parent
Title
feat(query): LLM function-call fallback for structured filter extraction
Summary
Add a feature-flagged LLM fallback stage that extracts structured FilterDSL arguments from natural language queries when deterministic router rules are ambiguous or insufficient. The LLM output is treated as untrusted input and must pass strict server-side schema/allowlist validation before execution.
This closes the current capability gap where routing may choose a strategy but no robust structured filter extraction occurs for complex temporal/attribute phrasing.
Problem Statement
Current query behavior relies on:
- deterministic rule-based routing, and
- optional LLM strategy classification fallback.
However, complex natural-language constraints (temporal ranges, multi-attribute constraints, mixed phrasing) are not consistently converted into executable filter arguments. This causes over-reliance on semantic ranking and brittle regex patches.
Scope
In scope
- New LLM-based filter extraction module producing strict JSON function-call-like output.
- Feature-flagged fallback path in query execution pipeline.
- Mapping/normalization to existing
FilterDSLshape. - Strict validation via existing
translateFilterconstraints and field/operator allowlists. - Timeout + circuit-breaker aware fallback to current behavior.
- Routing metadata extension to indicate whether inferred filters were applied.
- Unit tests for parser behavior and integration tests for fallback execution path.
Out of scope
- Replacing deterministic rules.
- Changing existing filter field semantics in
pg-helpers.ts. - Reworking graph traversal strategy.
- UI/CLI UX redesign.
Technical Approach
1) New module: filter extraction service
Create api/src/services/query-filter-parser.ts with:
extractStructuredFilter(request: { query: string; strategy: QueryStrategy; existingFilter?: FilterDSL | Record<string, unknown> }): Promise<FilterDSL | null>- Provider behavior:
EMBED_PROVIDER=openai: use chat completions with JSON schema-constrained response.- default ollama: use
/api/generateJSON-only prompt contract.
- Output contract:
- JSON object matching
FilterDSL(conditions, optionalcombine). - No unsupported fields/operators.
- JSON object matching
2) Pipeline integration
In api/src/services/query.ts:
- Run deterministic path first.
- Invoke LLM filter extraction only when all conditions are true:
- feature flag enabled,
- request has non-empty query,
- no explicit user
filterprovided, - routing method is
default,rule_fallback, or low-confidence-like path.
- If parser returns valid
FilterDSL, apply it to metadata/hybrid/semantic path where relevant. - If parser fails/invalid/times out, continue existing behavior unchanged.
3) Validation/safety
- All parsed output must pass
translateFiltervalidation before query execution. - Unknown field/operator -> discard parsed filter and fallback.
- Never execute raw LLM-supplied SQL fragments.
- Record structured telemetry (without secrets): parse attempt, success/failure reason, latency bucket.
4) Configuration
Add env vars:
ROUTER_FILTER_LLM_ENABLED(defaultfalse)ROUTER_FILTER_LLM_TIMEOUT_MS(default1500)ROUTER_FILTER_LLM_MODEL(default provider-appropriate generative model)- Reuse existing circuit breaker strategy where possible; if shared breaker is not practical, add dedicated lightweight breaker with same defaults as router classifier.
Dependencies
- Parent EPIC: EPIC(feat): Hybrid Query Router — metadata, graph, and semantic query strategies #109
- Related implementation: feat(query): Router and classifier (rules + LLM fallback) #112 (router/classifier), feat(query): Metadata strategy engine for structured and temporal filters #116 (metadata strategy), feat(query): Hybrid execution and result merge/rerank #110 (hybrid execution)
- No schema migration required.
Risks and Mitigations
-
Risk: False-positive filters degrade recall.
- Mitigation: Feature flag default off; apply only when no explicit filter; fallback on validation failures; preserve semantic fallback.
-
Risk: Latency increase.
- Mitigation: Tight timeout (
<=1500ms), circuit breaker, only call in ambiguous cases.
- Mitigation: Tight timeout (
-
Risk: Unsafe execution from LLM output.
- Mitigation: Strict typed schema + existing
translateFiltervalidation + no SQL generation by LLM.
- Mitigation: Strict typed schema + existing
Acceptance Criteria (AC)
- A new filter-parser service exists and returns either valid
FilterDSLornull. - Feature flag
ROUTER_FILTER_LLM_ENABLEDgates all LLM filter extraction behavior. - Parser is never called when user already supplies an explicit
filter. - Parsed filters are validated through existing filter validation logic before use.
- Invalid/unsupported parser output does not fail the request; system falls back to existing behavior.
- Timeout/circuit-break behavior prevents repeated slow/failing parser calls.
- Query response includes machine-readable indication when inferred filters were applied.
- Existing router/classifier tests remain green.
- New unit tests cover parser success/failure/timeout/invalid JSON cases.
- New integration tests cover at least one temporal query and one multi-constraint query with inferred filter success.
Definition of Done (DoD)
- Implementation merged with feature flag default
false. - Tests added and passing in CI for parser + query integration paths.
-
.env.exampleand API docs updated with new env vars and behavior. - Observability fields for parser attempt/result added to logs/metrics.
- Manual verification documented for:
all openai invoices from 2023 and 2024- one non-temporal multi-constraint query.
Validation Plan
- Unit: parser response normalization and validation mapping.
- Unit: timeout and malformed output handling.
- Unit: fallback behavior when parser fails.
- Integration: query endpoint with feature flag on and parser success.
- Integration: query endpoint with feature flag on and parser failure, assert unchanged baseline behavior.
- Regression: existing router/query tests pass unchanged.
Non-goals
- Full natural-language-to-SQL generation.
- Replacing current deterministic temporal parsing immediately.
- CLI-side LLM parsing.
AC/DoD/Non-goal Coverage Matrix
| Item | Type (AC/DoD/Non-goal) | Status (Met/Partial/Unmet/Unverified) | Evidence (spec/tests/behavior) | Notes |
|---|---|---|---|---|
A new filter-parser service exists and returns either valid FilterDSL or null. |
AC | Unverified | This issue spec | To be implemented |
Feature flag ROUTER_FILTER_LLM_ENABLED gates all LLM filter extraction behavior. |
AC | Unverified | This issue spec | To be implemented |
Parser is never called when user already supplies an explicit filter. |
AC | Unverified | This issue spec | To be implemented |
| Parsed filters are validated through existing filter validation logic before use. | AC | Unverified | This issue spec | To be implemented |
| Invalid/unsupported parser output does not fail the request; system falls back to existing behavior. | AC | Unverified | This issue spec | To be implemented |
| Timeout/circuit-break behavior prevents repeated slow/failing parser calls. | AC | Unverified | This issue spec | To be implemented |
| Query response includes machine-readable indication when inferred filters were applied. | AC | Unverified | This issue spec | To be implemented |
| Existing router/classifier tests remain green. | AC | Unverified | CI tests | To be implemented |
| New unit tests cover parser success/failure/timeout/invalid JSON cases. | AC | Unverified | Test plan section | To be implemented |
| New integration tests cover at least one temporal query and one multi-constraint query with inferred filter success. | AC | Unverified | Test plan section | To be implemented |
Implementation merged with feature flag default false. |
DoD | Unverified | PR + env defaults | To be implemented |
| Tests added and passing in CI for parser + query integration paths. | DoD | Unverified | CI run | To be implemented |
.env.example and API docs updated with new env vars and behavior. |
DoD | Unverified | docs changes | To be implemented |
| Observability fields for parser attempt/result added to logs/metrics. | DoD | Unverified | log/metric output | To be implemented |
Manual verification documented for: all openai invoices from 2023 and 2024; one non-temporal multi-constraint query. |
DoD | Unverified | manual report | To be implemented |
| Full natural-language-to-SQL generation. | Non-goal | Unverified | Non-goals section | Explicitly out of scope |
| Replacing current deterministic temporal parsing immediately. | Non-goal | Unverified | Non-goals section | Explicitly out of scope |
| CLI-side LLM parsing. | Non-goal | Unverified | Non-goals section | Explicitly out of scope |