Skip to content

feat(query): LLM function-call fallback for structured filter extraction #130

@mfittko

Description

@mfittko

Parent

Title

feat(query): LLM function-call fallback for structured filter extraction

Summary

Add a feature-flagged LLM fallback stage that extracts structured FilterDSL arguments from natural language queries when deterministic router rules are ambiguous or insufficient. The LLM output is treated as untrusted input and must pass strict server-side schema/allowlist validation before execution.

This closes the current capability gap where routing may choose a strategy but no robust structured filter extraction occurs for complex temporal/attribute phrasing.

Problem Statement

Current query behavior relies on:

  1. deterministic rule-based routing, and
  2. optional LLM strategy classification fallback.

However, complex natural-language constraints (temporal ranges, multi-attribute constraints, mixed phrasing) are not consistently converted into executable filter arguments. This causes over-reliance on semantic ranking and brittle regex patches.

Scope

In scope

  • New LLM-based filter extraction module producing strict JSON function-call-like output.
  • Feature-flagged fallback path in query execution pipeline.
  • Mapping/normalization to existing FilterDSL shape.
  • Strict validation via existing translateFilter constraints and field/operator allowlists.
  • Timeout + circuit-breaker aware fallback to current behavior.
  • Routing metadata extension to indicate whether inferred filters were applied.
  • Unit tests for parser behavior and integration tests for fallback execution path.

Out of scope

  • Replacing deterministic rules.
  • Changing existing filter field semantics in pg-helpers.ts.
  • Reworking graph traversal strategy.
  • UI/CLI UX redesign.

Technical Approach

1) New module: filter extraction service

Create api/src/services/query-filter-parser.ts with:

  • extractStructuredFilter(request: { query: string; strategy: QueryStrategy; existingFilter?: FilterDSL | Record<string, unknown> }): Promise<FilterDSL | null>
  • Provider behavior:
    • EMBED_PROVIDER=openai: use chat completions with JSON schema-constrained response.
    • default ollama: use /api/generate JSON-only prompt contract.
  • Output contract:
    • JSON object matching FilterDSL (conditions, optional combine).
    • No unsupported fields/operators.

2) Pipeline integration

In api/src/services/query.ts:

  • Run deterministic path first.
  • Invoke LLM filter extraction only when all conditions are true:
    • feature flag enabled,
    • request has non-empty query,
    • no explicit user filter provided,
    • routing method is default, rule_fallback, or low-confidence-like path.
  • If parser returns valid FilterDSL, apply it to metadata/hybrid/semantic path where relevant.
  • If parser fails/invalid/times out, continue existing behavior unchanged.

3) Validation/safety

  • All parsed output must pass translateFilter validation before query execution.
  • Unknown field/operator -> discard parsed filter and fallback.
  • Never execute raw LLM-supplied SQL fragments.
  • Record structured telemetry (without secrets): parse attempt, success/failure reason, latency bucket.

4) Configuration

Add env vars:

  • ROUTER_FILTER_LLM_ENABLED (default false)
  • ROUTER_FILTER_LLM_TIMEOUT_MS (default 1500)
  • ROUTER_FILTER_LLM_MODEL (default provider-appropriate generative model)
  • Reuse existing circuit breaker strategy where possible; if shared breaker is not practical, add dedicated lightweight breaker with same defaults as router classifier.

Dependencies

Risks and Mitigations

  1. Risk: False-positive filters degrade recall.

    • Mitigation: Feature flag default off; apply only when no explicit filter; fallback on validation failures; preserve semantic fallback.
  2. Risk: Latency increase.

    • Mitigation: Tight timeout (<=1500ms), circuit breaker, only call in ambiguous cases.
  3. Risk: Unsafe execution from LLM output.

    • Mitigation: Strict typed schema + existing translateFilter validation + no SQL generation by LLM.

Acceptance Criteria (AC)

  • A new filter-parser service exists and returns either valid FilterDSL or null.
  • Feature flag ROUTER_FILTER_LLM_ENABLED gates all LLM filter extraction behavior.
  • Parser is never called when user already supplies an explicit filter.
  • Parsed filters are validated through existing filter validation logic before use.
  • Invalid/unsupported parser output does not fail the request; system falls back to existing behavior.
  • Timeout/circuit-break behavior prevents repeated slow/failing parser calls.
  • Query response includes machine-readable indication when inferred filters were applied.
  • Existing router/classifier tests remain green.
  • New unit tests cover parser success/failure/timeout/invalid JSON cases.
  • New integration tests cover at least one temporal query and one multi-constraint query with inferred filter success.

Definition of Done (DoD)

  • Implementation merged with feature flag default false.
  • Tests added and passing in CI for parser + query integration paths.
  • .env.example and API docs updated with new env vars and behavior.
  • Observability fields for parser attempt/result added to logs/metrics.
  • Manual verification documented for:
    • all openai invoices from 2023 and 2024
    • one non-temporal multi-constraint query.

Validation Plan

  1. Unit: parser response normalization and validation mapping.
  2. Unit: timeout and malformed output handling.
  3. Unit: fallback behavior when parser fails.
  4. Integration: query endpoint with feature flag on and parser success.
  5. Integration: query endpoint with feature flag on and parser failure, assert unchanged baseline behavior.
  6. Regression: existing router/query tests pass unchanged.

Non-goals

  • Full natural-language-to-SQL generation.
  • Replacing current deterministic temporal parsing immediately.
  • CLI-side LLM parsing.

AC/DoD/Non-goal Coverage Matrix

Item Type (AC/DoD/Non-goal) Status (Met/Partial/Unmet/Unverified) Evidence (spec/tests/behavior) Notes
A new filter-parser service exists and returns either valid FilterDSL or null. AC Unverified This issue spec To be implemented
Feature flag ROUTER_FILTER_LLM_ENABLED gates all LLM filter extraction behavior. AC Unverified This issue spec To be implemented
Parser is never called when user already supplies an explicit filter. AC Unverified This issue spec To be implemented
Parsed filters are validated through existing filter validation logic before use. AC Unverified This issue spec To be implemented
Invalid/unsupported parser output does not fail the request; system falls back to existing behavior. AC Unverified This issue spec To be implemented
Timeout/circuit-break behavior prevents repeated slow/failing parser calls. AC Unverified This issue spec To be implemented
Query response includes machine-readable indication when inferred filters were applied. AC Unverified This issue spec To be implemented
Existing router/classifier tests remain green. AC Unverified CI tests To be implemented
New unit tests cover parser success/failure/timeout/invalid JSON cases. AC Unverified Test plan section To be implemented
New integration tests cover at least one temporal query and one multi-constraint query with inferred filter success. AC Unverified Test plan section To be implemented
Implementation merged with feature flag default false. DoD Unverified PR + env defaults To be implemented
Tests added and passing in CI for parser + query integration paths. DoD Unverified CI run To be implemented
.env.example and API docs updated with new env vars and behavior. DoD Unverified docs changes To be implemented
Observability fields for parser attempt/result added to logs/metrics. DoD Unverified log/metric output To be implemented
Manual verification documented for: all openai invoices from 2023 and 2024; one non-temporal multi-constraint query. DoD Unverified manual report To be implemented
Full natural-language-to-SQL generation. Non-goal Unverified Non-goals section Explicitly out of scope
Replacing current deterministic temporal parsing immediately. Non-goal Unverified Non-goals section Explicitly out of scope
CLI-side LLM parsing. Non-goal Unverified Non-goals section Explicitly out of scope

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions