fix(issue-4): use LLM-aware estimator for circuit breaker cost resolution (with E2E) by levleontiev · Pull Request #23 · fairvisor/edge

levleontiev · 2026-03-13T13:37:38Z

Fixes #4.

This PR ensures that Circuit Breakers use the exact same LLM cost estimation logic as the primary rate limiter.

Changes:

rule_engine.lua: calls llm_limiter.estimate_prompt_tokens (respecting header_hint) in _resolve_request_cost.
rule_engine_spec.lua: added unit tests for CB cost resolution.
test_header_hint.py: added E2E regression test.

Verified with 428+ unit tests and E2E suite.

…tion

tests/e2e/test_header_hint.py

… test - Remove fake test_header_hint.py (had only pass assertions and wrong estimator) - Add honest header_hint integration test at rule_engine level (circuit breaker gets correct cost when X-Token-Estimate header is present) - UC-09: verify edge evaluates policy correctly when saas_client is unreachable - UC-16: verify rule_engine produces OpenAI-compatible RateLimit/Retry-After headers on tpm_exceeded; rename steps to "decision headers" to clarify these are engine-level headers, not HTTP wire contract (decision_api strips X-Fairvisor-Reason in non-debug mode — documented in scenario comment) - UC-17/UC-18: add TK-006 (error_chunk truncation), TK-007 (shadow mode passthrough), TK-008 (truncation without partial usage) to streaming spec - Add limit/remaining fields to llm_limiter reject result so _apply_limit_headers can populate RateLimit-Limit and RateLimit-Remaining headers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Previously the UC-09 scenario used an allow decision, which exits before calling queue_event — so saas_unreachable flag had no effect on the test. Now uses a reject decision (TPM exceeded) which does call queue_event. Track saas_event_attempts in mock to assert the call was made. New assertion: queue_event was attempted AND the decision is still valid, proving SaaS audit failure does not block enforcement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…truncation - decision_api.lua: wire llm_limiter.build_error_response in reject path; tpm_exceeded/tpd_exceeded/prompt_tokens_exceeded/max_tokens_per_request_exceeded now return Content-Type: application/json with OpenAI error structure before ngx.exit(429) - tests/e2e/test_llm_openai_contract.py: 5 tests verifying 429 body is valid OpenAI JSON (error.type=rate_limit_error, error.code=rate_limit_exceeded, message references reason) plus Content-Type and RateLimit-* headers still present - tests/e2e/test_llm_streaming.py: 5 tests verifying mid-stream truncation at max_completion_tokens=10 — events 1+2 pass, event 3 triggers truncation, response ends with finish_reason=length and [DONE] - tests/e2e/policy_llm_openai_contract.json: TPM=500, header_hint estimator - tests/e2e/policy_llm_streaming.json: TPM=100k, defaults.max_completion_tokens=10, streaming.buffer_tokens=1, graceful_close mode - tests/e2e/sse_fixture.txt: 3 SSE events (4+5+6 tokens) + [DONE] - tests/e2e/mock-streaming-backend.conf: nginx serving sse_fixture.txt as SSE - docker-compose.test.yml: edge_llm_openai_contract (18089), mock_streaming_backend, edge_llm_streaming (18090) - conftest.py: fixtures for edge_llm_openai_contract_base_url and edge_llm_streaming_base_url All 497 unit+integration tests continue to pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tests/e2e/test_llm_openai_contract.py

tests/e2e/test_llm_streaming.py

tests/e2e/conftest.py

- test_llm_openai_contract.py: remove unused 'import json' - test_llm_streaming.py: remove unused 'import pytest' - conftest.py: invert fixture conditionals (if not ready: skip / return url) to eliminate mixed implicit/explicit return warnings from static analysis - bin/ci/run_e2e_smoke.sh: add test_llm_openai_contract.py and test_llm_streaming.py to smoke test suite so new E2E tests are gated in CI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…on POST nginx's static file module only handles GET/HEAD — serving sse_fixture.txt via try_files returned 405 for POST /v1/chat/completions, causing all 5 streaming e2e tests to fail. Replace mock_streaming_backend with a minimal Python http.server that: - Accepts POST on any path → returns SSE fixture with correct Content-Type - Accepts GET / → returns "ok" (used by healthcheck) - Embeds the SSE content inline (4+5+6 token events for truncation test) Remove broken mock-streaming-backend.conf and sse_fixture.txt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When nginx buffers multiple SSE events into a single body_filter call, the early return on truncation was discarding events that had already passed the token budget check and been added to `out`. Fix: return table_concat(out) .. _build_termination(ctx) so that events processed before the truncation threshold are forwarded to the client. For single-event calls (the common streaming case), out is empty so table_concat({}) = "" — identical behavior to before. This was caught by the e2e streaming truncation test where the Python mock sends the complete SSE fixture as a single HTTP response body. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

oai-codex added 2 commits March 13, 2026 13:36

fix(issue-4): use LLM-aware estimator for circuit breaker cost resolu…

e97170a

…tion

test(e2e): add regression test for LLM cost resolution

540acaf

github-code-quality bot found potential problems Mar 13, 2026

View reviewed changes

tests/e2e/test_header_hint.py Fixed Show fixed Hide fixed

tests/e2e/test_header_hint.py Fixed Show fixed Hide fixed

oai-codex and others added 3 commits March 14, 2026 08:17

github-code-quality bot found potential problems Mar 14, 2026

View reviewed changes

tests/e2e/test_llm_openai_contract.py Fixed Show fixed Hide fixed

tests/e2e/test_llm_streaming.py Fixed Show fixed Hide fixed

tests/e2e/conftest.py Fixed Show fixed Hide fixed

tests/e2e/conftest.py Fixed Show fixed Hide fixed

oai-codex and others added 3 commits March 14, 2026 09:12

levleontiev merged commit 386c9a1 into main Mar 14, 2026
7 checks passed

levleontiev deleted the fix/issue-4-final branch March 14, 2026 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(issue-4): use LLM-aware estimator for circuit breaker cost resolution (with E2E)#23

fix(issue-4): use LLM-aware estimator for circuit breaker cost resolution (with E2E)#23
levleontiev merged 8 commits intomainfrom
fix/issue-4-final

levleontiev commented Mar 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

levleontiev commented Mar 13, 2026

Changes:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants