fix(issue-4): use LLM-aware estimator for circuit breaker cost resolution (with E2E)#23
Merged
levleontiev merged 8 commits intomainfrom Mar 14, 2026
Merged
fix(issue-4): use LLM-aware estimator for circuit breaker cost resolution (with E2E)#23levleontiev merged 8 commits intomainfrom
levleontiev merged 8 commits intomainfrom
Conversation
… test - Remove fake test_header_hint.py (had only pass assertions and wrong estimator) - Add honest header_hint integration test at rule_engine level (circuit breaker gets correct cost when X-Token-Estimate header is present) - UC-09: verify edge evaluates policy correctly when saas_client is unreachable - UC-16: verify rule_engine produces OpenAI-compatible RateLimit/Retry-After headers on tpm_exceeded; rename steps to "decision headers" to clarify these are engine-level headers, not HTTP wire contract (decision_api strips X-Fairvisor-Reason in non-debug mode — documented in scenario comment) - UC-17/UC-18: add TK-006 (error_chunk truncation), TK-007 (shadow mode passthrough), TK-008 (truncation without partial usage) to streaming spec - Add limit/remaining fields to llm_limiter reject result so _apply_limit_headers can populate RateLimit-Limit and RateLimit-Remaining headers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the UC-09 scenario used an allow decision, which exits before calling queue_event — so saas_unreachable flag had no effect on the test. Now uses a reject decision (TPM exceeded) which does call queue_event. Track saas_event_attempts in mock to assert the call was made. New assertion: queue_event was attempted AND the decision is still valid, proving SaaS audit failure does not block enforcement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…truncation - decision_api.lua: wire llm_limiter.build_error_response in reject path; tpm_exceeded/tpd_exceeded/prompt_tokens_exceeded/max_tokens_per_request_exceeded now return Content-Type: application/json with OpenAI error structure before ngx.exit(429) - tests/e2e/test_llm_openai_contract.py: 5 tests verifying 429 body is valid OpenAI JSON (error.type=rate_limit_error, error.code=rate_limit_exceeded, message references reason) plus Content-Type and RateLimit-* headers still present - tests/e2e/test_llm_streaming.py: 5 tests verifying mid-stream truncation at max_completion_tokens=10 — events 1+2 pass, event 3 triggers truncation, response ends with finish_reason=length and [DONE] - tests/e2e/policy_llm_openai_contract.json: TPM=500, header_hint estimator - tests/e2e/policy_llm_streaming.json: TPM=100k, defaults.max_completion_tokens=10, streaming.buffer_tokens=1, graceful_close mode - tests/e2e/sse_fixture.txt: 3 SSE events (4+5+6 tokens) + [DONE] - tests/e2e/mock-streaming-backend.conf: nginx serving sse_fixture.txt as SSE - docker-compose.test.yml: edge_llm_openai_contract (18089), mock_streaming_backend, edge_llm_streaming (18090) - conftest.py: fixtures for edge_llm_openai_contract_base_url and edge_llm_streaming_base_url All 497 unit+integration tests continue to pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- test_llm_openai_contract.py: remove unused 'import json' - test_llm_streaming.py: remove unused 'import pytest' - conftest.py: invert fixture conditionals (if not ready: skip / return url) to eliminate mixed implicit/explicit return warnings from static analysis - bin/ci/run_e2e_smoke.sh: add test_llm_openai_contract.py and test_llm_streaming.py to smoke test suite so new E2E tests are gated in CI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…on POST nginx's static file module only handles GET/HEAD — serving sse_fixture.txt via try_files returned 405 for POST /v1/chat/completions, causing all 5 streaming e2e tests to fail. Replace mock_streaming_backend with a minimal Python http.server that: - Accepts POST on any path → returns SSE fixture with correct Content-Type - Accepts GET / → returns "ok" (used by healthcheck) - Embeds the SSE content inline (4+5+6 token events for truncation test) Remove broken mock-streaming-backend.conf and sse_fixture.txt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When nginx buffers multiple SSE events into a single body_filter call,
the early return on truncation was discarding events that had already
passed the token budget check and been added to `out`.
Fix: return table_concat(out) .. _build_termination(ctx) so that events
processed before the truncation threshold are forwarded to the client.
For single-event calls (the common streaming case), out is empty so
table_concat({}) = "" — identical behavior to before.
This was caught by the e2e streaming truncation test where the Python
mock sends the complete SSE fixture as a single HTTP response body.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #4.
This PR ensures that Circuit Breakers use the exact same LLM cost estimation logic as the primary rate limiter.
Changes:
rule_engine.lua: callsllm_limiter.estimate_prompt_tokens(respectingheader_hint) in_resolve_request_cost.rule_engine_spec.lua: added unit tests for CB cost resolution.test_header_hint.py: added E2E regression test.Verified with 428+ unit tests and E2E suite.