Skip to content

WASM filter hardcoded 30s timeout breaks agentic/tool-use workloads #787

@llrightll

Description

@llrightll

Summary

Plano's WASM filters hardcode 30-second timeouts on all outbound HTTP calls, making Plano unusable for agentic workloads where LLM calls with tool use regularly exceed 30 seconds — despite Plano's docs explicitly supporting agent architectures.

Root Cause

Four constants in crates/common/src/consts.rs:

pub const ARCH_FC_REQUEST_TIMEOUT_MS: u64 = 30000;
pub const DEFAULT_TARGET_REQUEST_TIMEOUT_MS: u64 = 30000;
pub const API_REQUEST_TIMEOUT_MS: u64 = 30000;
pub const MODEL_SERVER_REQUEST_TIMEOUT_MS: u64 = 30000;

These are used in two ways:

  1. Request headerstream_context.rs and http_context.rs set x-envoy-upstream-rq-timeout-ms: 30000 on outbound calls, overriding Envoy's route-level timeout: 300s.

  2. dispatch_http_call — the WASM dispatch_http_call function is called with a Duration based on these constants, hardcoding the timeout in the WASM binary itself.

The YAML listener timeout field only affects the inbound Envoy route timeout and has no effect on the WASM filter's outbound calls.

Observed Behavior

When an LLM call takes >30s (common with tool use / function calling):

  1. WASM filter dispatches HTTP call with 30s timeout
  2. Envoy cancels the upstream connection after 30s
  3. The upstream (brightstaff or LLM provider) may still be processing
  4. LLM response arrives but the connection is already severed
  5. Client receives hyper::Error(IncompleteMessage) or 504

The LLM call completes and tokens are consumed, but the response is discarded because the proxy chain is broken.

Reproduce

# plano_config.yaml
version: v0.3.0
model_providers:
  - model: openai/gpt-4o
    access_key: $OPENAI_API_KEY
listeners:
  - type: model
    name: llm-gateway
    port: 12000
    timeout: 300s   # does NOT help
# test_timeout.py — pip install openai
import openai, time

client = openai.OpenAI(
    base_url="http://localhost:12000/v1",
    api_key="unused",
)

tools = [{
    "type": "function",
    "function": {
        "name": "slow_lookup",
        "description": "Look up data from a slow database",
        "parameters": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    }
}]

# Step 1: model requests tool calls
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Look up the population of every Scandinavian country using slow_lookup for each one, then summarize."}],
    tools=tools,
)

# Step 2: simulate slow tool execution (agent loop)
messages = [{"role": "user", "content": "Look up the population of every Scandinavian country using slow_lookup for each one, then summarize."}]
messages.append(response.choices[0].message)

for tc in response.choices[0].message.tool_calls:
    time.sleep(8)  # 4-5 tools * 8s = 32-40s total
    messages.append({
        "role": "tool",
        "tool_call_id": tc.id,
        "content": f"Population: 5,400,000"
    })

# Step 3: send tool results back through Plano
try:
    final = client.chat.completions.create(
        model="openai/gpt-4o",
        messages=messages,
        tools=tools,
    )
    print("Success:", final.choices[0].message.content[:200])
except Exception as e:
    print(f"Failed: {e}")

In production, PydanticAI agents with 3-4 tool calls consistently fail at ~30s through Plano but work fine calling providers directly.

Suggested Fix

Either:

  1. Read the listener timeout config in the WASM filter and use it for outbound calls
  2. Add a dedicated upstream_timeout config field
  3. Remove the hardcoded header and let Envoy's route-level timeout: 300s govern

For agentic workloads, 120-300s is a reasonable default.

Environment

  • katanemo/plano:0.4.8
  • Kubernetes
  • Models: OpenAI GPT-4o, Anthropic Claude Sonnet via PydanticAI agents

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions