-
Notifications
You must be signed in to change notification settings - Fork 373
Description
Summary
Plano's WASM filters hardcode 30-second timeouts on all outbound HTTP calls, making Plano unusable for agentic workloads where LLM calls with tool use regularly exceed 30 seconds — despite Plano's docs explicitly supporting agent architectures.
Root Cause
Four constants in crates/common/src/consts.rs:
pub const ARCH_FC_REQUEST_TIMEOUT_MS: u64 = 30000;
pub const DEFAULT_TARGET_REQUEST_TIMEOUT_MS: u64 = 30000;
pub const API_REQUEST_TIMEOUT_MS: u64 = 30000;
pub const MODEL_SERVER_REQUEST_TIMEOUT_MS: u64 = 30000;These are used in two ways:
-
Request header —
stream_context.rsandhttp_context.rssetx-envoy-upstream-rq-timeout-ms: 30000on outbound calls, overriding Envoy's route-leveltimeout: 300s. -
dispatch_http_call — the WASM
dispatch_http_callfunction is called with aDurationbased on these constants, hardcoding the timeout in the WASM binary itself.
The YAML listener timeout field only affects the inbound Envoy route timeout and has no effect on the WASM filter's outbound calls.
Observed Behavior
When an LLM call takes >30s (common with tool use / function calling):
- WASM filter dispatches HTTP call with 30s timeout
- Envoy cancels the upstream connection after 30s
- The upstream (brightstaff or LLM provider) may still be processing
- LLM response arrives but the connection is already severed
- Client receives
hyper::Error(IncompleteMessage)or 504
The LLM call completes and tokens are consumed, but the response is discarded because the proxy chain is broken.
Reproduce
# plano_config.yaml
version: v0.3.0
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
listeners:
- type: model
name: llm-gateway
port: 12000
timeout: 300s # does NOT help# test_timeout.py — pip install openai
import openai, time
client = openai.OpenAI(
base_url="http://localhost:12000/v1",
api_key="unused",
)
tools = [{
"type": "function",
"function": {
"name": "slow_lookup",
"description": "Look up data from a slow database",
"parameters": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"]
}
}
}]
# Step 1: model requests tool calls
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Look up the population of every Scandinavian country using slow_lookup for each one, then summarize."}],
tools=tools,
)
# Step 2: simulate slow tool execution (agent loop)
messages = [{"role": "user", "content": "Look up the population of every Scandinavian country using slow_lookup for each one, then summarize."}]
messages.append(response.choices[0].message)
for tc in response.choices[0].message.tool_calls:
time.sleep(8) # 4-5 tools * 8s = 32-40s total
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": f"Population: 5,400,000"
})
# Step 3: send tool results back through Plano
try:
final = client.chat.completions.create(
model="openai/gpt-4o",
messages=messages,
tools=tools,
)
print("Success:", final.choices[0].message.content[:200])
except Exception as e:
print(f"Failed: {e}")In production, PydanticAI agents with 3-4 tool calls consistently fail at ~30s through Plano but work fine calling providers directly.
Suggested Fix
Either:
- Read the listener
timeoutconfig in the WASM filter and use it for outbound calls - Add a dedicated
upstream_timeoutconfig field - Remove the hardcoded header and let Envoy's route-level
timeout: 300sgovern
For agentic workloads, 120-300s is a reasonable default.
Environment
katanemo/plano:0.4.8- Kubernetes
- Models: OpenAI GPT-4o, Anthropic Claude Sonnet via PydanticAI agents