WASM filter hardcoded 30s timeout breaks agentic/tool-use workloads

## Summary

Plano's WASM filters hardcode 30-second timeouts on all outbound HTTP calls, making Plano unusable for agentic workloads where LLM calls with tool use regularly exceed 30 seconds — despite [Plano's docs explicitly supporting agent architectures](https://docs.planoai.dev/concepts/agents.html).

## Root Cause

Four constants in `crates/common/src/consts.rs`:

```rust
pub const ARCH_FC_REQUEST_TIMEOUT_MS: u64 = 30000;
pub const DEFAULT_TARGET_REQUEST_TIMEOUT_MS: u64 = 30000;
pub const API_REQUEST_TIMEOUT_MS: u64 = 30000;
pub const MODEL_SERVER_REQUEST_TIMEOUT_MS: u64 = 30000;
```

These are used in two ways:

1. **Request header** — `stream_context.rs` and `http_context.rs` set `x-envoy-upstream-rq-timeout-ms: 30000` on outbound calls, overriding Envoy's route-level `timeout: 300s`.

2. **dispatch_http_call** — the WASM `dispatch_http_call` function is called with a `Duration` based on these constants, hardcoding the timeout in the WASM binary itself.

The YAML listener `timeout` field only affects the inbound Envoy route timeout and has no effect on the WASM filter's outbound calls.

## Observed Behavior

When an LLM call takes >30s (common with tool use / function calling):

1. WASM filter dispatches HTTP call with 30s timeout
2. Envoy cancels the upstream connection after 30s
3. The upstream (brightstaff or LLM provider) may still be processing
4. LLM response arrives but the connection is already severed
5. Client receives `hyper::Error(IncompleteMessage)` or 504

The LLM call completes and tokens are consumed, but the response is discarded because the proxy chain is broken.

## Reproduce

```yaml
# plano_config.yaml
version: v0.3.0
model_providers:
  - model: openai/gpt-4o
    access_key: $OPENAI_API_KEY
listeners:
  - type: model
    name: llm-gateway
    port: 12000
    timeout: 300s   # does NOT help
```

```python
# test_timeout.py — pip install openai
import openai, time

client = openai.OpenAI(
    base_url="http://localhost:12000/v1",
    api_key="unused",
)

tools = [{
    "type": "function",
    "function": {
        "name": "slow_lookup",
        "description": "Look up data from a slow database",
        "parameters": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    }
}]

# Step 1: model requests tool calls
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Look up the population of every Scandinavian country using slow_lookup for each one, then summarize."}],
    tools=tools,
)

# Step 2: simulate slow tool execution (agent loop)
messages = [{"role": "user", "content": "Look up the population of every Scandinavian country using slow_lookup for each one, then summarize."}]
messages.append(response.choices[0].message)

for tc in response.choices[0].message.tool_calls:
    time.sleep(8)  # 4-5 tools * 8s = 32-40s total
    messages.append({
        "role": "tool",
        "tool_call_id": tc.id,
        "content": f"Population: 5,400,000"
    })

# Step 3: send tool results back through Plano
try:
    final = client.chat.completions.create(
        model="openai/gpt-4o",
        messages=messages,
        tools=tools,
    )
    print("Success:", final.choices[0].message.content[:200])
except Exception as e:
    print(f"Failed: {e}")
```

In production, PydanticAI agents with 3-4 tool calls consistently fail at ~30s through Plano but work fine calling providers directly.

## Suggested Fix

Either:
1. Read the listener `timeout` config in the WASM filter and use it for outbound calls
2. Add a dedicated `upstream_timeout` config field
3. Remove the hardcoded header and let Envoy's route-level `timeout: 300s` govern

For agentic workloads, 120-300s is a reasonable default.

## Environment

- `katanemo/plano:0.4.8`
- Kubernetes
- Models: OpenAI GPT-4o, Anthropic Claude Sonnet via PydanticAI agents

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WASM filter hardcoded 30s timeout breaks agentic/tool-use workloads #787

Summary

Root Cause

Observed Behavior

Reproduce

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

WASM filter hardcoded 30s timeout breaks agentic/tool-use workloads #787

Description

Summary

Root Cause

Observed Behavior

Reproduce

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions