feat(query): LLM function-call fallback for structured filter extraction

## Parent
- Parent issue: #109

## Title
feat(query): LLM function-call fallback for structured filter extraction

## Summary

Add a feature-flagged LLM fallback stage that extracts structured `FilterDSL` arguments from natural language queries when deterministic router rules are ambiguous or insufficient. The LLM output is treated as untrusted input and must pass strict server-side schema/allowlist validation before execution.

This closes the current capability gap where routing may choose a strategy but no robust structured filter extraction occurs for complex temporal/attribute phrasing.

## Problem Statement

Current query behavior relies on:
1. deterministic rule-based routing, and
2. optional LLM strategy classification fallback.

However, complex natural-language constraints (temporal ranges, multi-attribute constraints, mixed phrasing) are not consistently converted into executable filter arguments. This causes over-reliance on semantic ranking and brittle regex patches.

## Scope

### In scope
- New LLM-based filter extraction module producing strict JSON function-call-like output.
- Feature-flagged fallback path in query execution pipeline.
- Mapping/normalization to existing `FilterDSL` shape.
- Strict validation via existing `translateFilter` constraints and field/operator allowlists.
- Timeout + circuit-breaker aware fallback to current behavior.
- Routing metadata extension to indicate whether inferred filters were applied.
- Unit tests for parser behavior and integration tests for fallback execution path.

### Out of scope
- Replacing deterministic rules.
- Changing existing filter field semantics in `pg-helpers.ts`.
- Reworking graph traversal strategy.
- UI/CLI UX redesign.

## Technical Approach

### 1) New module: filter extraction service
Create `api/src/services/query-filter-parser.ts` with:
- `extractStructuredFilter(request: { query: string; strategy: QueryStrategy; existingFilter?: FilterDSL | Record<string, unknown> }): Promise<FilterDSL | null>`
- Provider behavior:
  - `EMBED_PROVIDER=openai`: use chat completions with JSON schema-constrained response.
  - default ollama: use `/api/generate` JSON-only prompt contract.
- Output contract:
  - JSON object matching `FilterDSL` (`conditions`, optional `combine`).
  - No unsupported fields/operators.

### 2) Pipeline integration
In `api/src/services/query.ts`:
- Run deterministic path first.
- Invoke LLM filter extraction only when all conditions are true:
  - feature flag enabled,
  - request has non-empty query,
  - no explicit user `filter` provided,
  - routing method is `default`, `rule_fallback`, or low-confidence-like path.
- If parser returns valid `FilterDSL`, apply it to metadata/hybrid/semantic path where relevant.
- If parser fails/invalid/times out, continue existing behavior unchanged.

### 3) Validation/safety
- All parsed output must pass `translateFilter` validation before query execution.
- Unknown field/operator -> discard parsed filter and fallback.
- Never execute raw LLM-supplied SQL fragments.
- Record structured telemetry (without secrets): parse attempt, success/failure reason, latency bucket.

### 4) Configuration
Add env vars:
- `ROUTER_FILTER_LLM_ENABLED` (default `false`)
- `ROUTER_FILTER_LLM_TIMEOUT_MS` (default `1500`)
- `ROUTER_FILTER_LLM_MODEL` (default provider-appropriate generative model)
- Reuse existing circuit breaker strategy where possible; if shared breaker is not practical, add dedicated lightweight breaker with same defaults as router classifier.

## Dependencies
- Parent EPIC: #109
- Related implementation: #112 (router/classifier), #116 (metadata strategy), #110 (hybrid execution)
- No schema migration required.

## Risks and Mitigations

1. **Risk:** False-positive filters degrade recall.
   - **Mitigation:** Feature flag default off; apply only when no explicit filter; fallback on validation failures; preserve semantic fallback.

2. **Risk:** Latency increase.
   - **Mitigation:** Tight timeout (`<=1500ms`), circuit breaker, only call in ambiguous cases.

3. **Risk:** Unsafe execution from LLM output.
   - **Mitigation:** Strict typed schema + existing `translateFilter` validation + no SQL generation by LLM.

## Acceptance Criteria (AC)

- [ ] A new filter-parser service exists and returns either valid `FilterDSL` or `null`.
- [ ] Feature flag `ROUTER_FILTER_LLM_ENABLED` gates all LLM filter extraction behavior.
- [ ] Parser is never called when user already supplies an explicit `filter`.
- [ ] Parsed filters are validated through existing filter validation logic before use.
- [ ] Invalid/unsupported parser output does not fail the request; system falls back to existing behavior.
- [ ] Timeout/circuit-break behavior prevents repeated slow/failing parser calls.
- [ ] Query response includes machine-readable indication when inferred filters were applied.
- [ ] Existing router/classifier tests remain green.
- [ ] New unit tests cover parser success/failure/timeout/invalid JSON cases.
- [ ] New integration tests cover at least one temporal query and one multi-constraint query with inferred filter success.

## Definition of Done (DoD)

- [ ] Implementation merged with feature flag default `false`.
- [ ] Tests added and passing in CI for parser + query integration paths.
- [ ] `.env.example` and API docs updated with new env vars and behavior.
- [ ] Observability fields for parser attempt/result added to logs/metrics.
- [ ] Manual verification documented for:
  - `all openai invoices from 2023 and 2024`
  - one non-temporal multi-constraint query.

## Validation Plan

1. Unit: parser response normalization and validation mapping.
2. Unit: timeout and malformed output handling.
3. Unit: fallback behavior when parser fails.
4. Integration: query endpoint with feature flag on and parser success.
5. Integration: query endpoint with feature flag on and parser failure, assert unchanged baseline behavior.
6. Regression: existing router/query tests pass unchanged.

## Non-goals

- Full natural-language-to-SQL generation.
- Replacing current deterministic temporal parsing immediately.
- CLI-side LLM parsing.

---

## AC/DoD/Non-goal Coverage Matrix

| Item | Type (AC/DoD/Non-goal) | Status (Met/Partial/Unmet/Unverified) | Evidence (spec/tests/behavior) | Notes |
|---|---|---|---|---|
| A new filter-parser service exists and returns either valid `FilterDSL` or `null`. | AC | Unverified | This issue spec | To be implemented |
| Feature flag `ROUTER_FILTER_LLM_ENABLED` gates all LLM filter extraction behavior. | AC | Unverified | This issue spec | To be implemented |
| Parser is never called when user already supplies an explicit `filter`. | AC | Unverified | This issue spec | To be implemented |
| Parsed filters are validated through existing filter validation logic before use. | AC | Unverified | This issue spec | To be implemented |
| Invalid/unsupported parser output does not fail the request; system falls back to existing behavior. | AC | Unverified | This issue spec | To be implemented |
| Timeout/circuit-break behavior prevents repeated slow/failing parser calls. | AC | Unverified | This issue spec | To be implemented |
| Query response includes machine-readable indication when inferred filters were applied. | AC | Unverified | This issue spec | To be implemented |
| Existing router/classifier tests remain green. | AC | Unverified | CI tests | To be implemented |
| New unit tests cover parser success/failure/timeout/invalid JSON cases. | AC | Unverified | Test plan section | To be implemented |
| New integration tests cover at least one temporal query and one multi-constraint query with inferred filter success. | AC | Unverified | Test plan section | To be implemented |
| Implementation merged with feature flag default `false`. | DoD | Unverified | PR + env defaults | To be implemented |
| Tests added and passing in CI for parser + query integration paths. | DoD | Unverified | CI run | To be implemented |
| `.env.example` and API docs updated with new env vars and behavior. | DoD | Unverified | docs changes | To be implemented |
| Observability fields for parser attempt/result added to logs/metrics. | DoD | Unverified | log/metric output | To be implemented |
| Manual verification documented for: `all openai invoices from 2023 and 2024`; one non-temporal multi-constraint query. | DoD | Unverified | manual report | To be implemented |
| Full natural-language-to-SQL generation. | Non-goal | Unverified | Non-goals section | Explicitly out of scope |
| Replacing current deterministic temporal parsing immediately. | Non-goal | Unverified | Non-goals section | Explicitly out of scope |
| CLI-side LLM parsing. | Non-goal | Unverified | Non-goals section | Explicitly out of scope |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(query): LLM function-call fallback for structured filter extraction #130

Parent

Title

Summary

Problem Statement

Scope

In scope

Out of scope

Technical Approach

1) New module: filter extraction service

2) Pipeline integration

3) Validation/safety

4) Configuration

Dependencies

Risks and Mitigations

Acceptance Criteria (AC)

Definition of Done (DoD)

Validation Plan

Non-goals

AC/DoD/Non-goal Coverage Matrix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Item	Type (AC/DoD/Non-goal)	Status (Met/Partial/Unmet/Unverified)	Evidence (spec/tests/behavior)	Notes
A new filter-parser service exists and returns either valid `FilterDSL` or `null`.	AC	Unverified	This issue spec	To be implemented
Feature flag `ROUTER_FILTER_LLM_ENABLED` gates all LLM filter extraction behavior.	AC	Unverified	This issue spec	To be implemented
Parser is never called when user already supplies an explicit `filter`.	AC	Unverified	This issue spec	To be implemented
Parsed filters are validated through existing filter validation logic before use.	AC	Unverified	This issue spec	To be implemented
Invalid/unsupported parser output does not fail the request; system falls back to existing behavior.	AC	Unverified	This issue spec	To be implemented
Timeout/circuit-break behavior prevents repeated slow/failing parser calls.	AC	Unverified	This issue spec	To be implemented
Query response includes machine-readable indication when inferred filters were applied.	AC	Unverified	This issue spec	To be implemented
Existing router/classifier tests remain green.	AC	Unverified	CI tests	To be implemented
New unit tests cover parser success/failure/timeout/invalid JSON cases.	AC	Unverified	Test plan section	To be implemented
New integration tests cover at least one temporal query and one multi-constraint query with inferred filter success.	AC	Unverified	Test plan section	To be implemented
Implementation merged with feature flag default `false`.	DoD	Unverified	PR + env defaults	To be implemented
Tests added and passing in CI for parser + query integration paths.	DoD	Unverified	CI run	To be implemented
`.env.example` and API docs updated with new env vars and behavior.	DoD	Unverified	docs changes	To be implemented
Observability fields for parser attempt/result added to logs/metrics.	DoD	Unverified	log/metric output	To be implemented
Manual verification documented for: `all openai invoices from 2023 and 2024`; one non-temporal multi-constraint query.	DoD	Unverified	manual report	To be implemented
Full natural-language-to-SQL generation.	Non-goal	Unverified	Non-goals section	Explicitly out of scope
Replacing current deterministic temporal parsing immediately.	Non-goal	Unverified	Non-goals section	Explicitly out of scope
CLI-side LLM parsing.	Non-goal	Unverified	Non-goals section	Explicitly out of scope

feat(query): LLM function-call fallback for structured filter extraction #130

Description

Parent

Title

Summary

Problem Statement

Scope

In scope

Out of scope

Technical Approach

1) New module: filter extraction service

2) Pipeline integration

3) Validation/safety

4) Configuration

Dependencies

Risks and Mitigations

Acceptance Criteria (AC)

Definition of Done (DoD)

Validation Plan

Non-goals

AC/DoD/Non-goal Coverage Matrix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions