fix: make upstream timeout configurable, default to 300s by syedhashmi · Pull Request #799 · katanemo/plano

syedhashmi · 2026-03-05T00:33:44Z

Placeholder PR for Adil to review the timeout related changes.

fixes #787

…787) Hardcoded 30s timeouts in envoy config caused premature termination of long-running LLM requests (tool-use, agentic workflows). Make timeouts configurable via upstream_timeout_ms override and default to 300s.

salmanap

Some comments on the PR. Also, I think we would need a test case here - else its hard to tell if the change in timeout is actually working. Lastly, I don't follow from the issues request how the user simulated a timeout. From the looks of that code, I see that he is adding a time.sleep on local code, which has no implication on the response from an upstream LLM

salmanap · 2026-03-08T20:46:45Z

cli/planoai/config_generator.py

        "upstream_tls_ca_path", "/etc/ssl/certs/ca-certificates.crt"
    )

+    upstream_timeout_ms = overrides.get("upstream_timeout_ms")


not sure why we we have an upstream_timeout_rs field, when the model_listener object already has a timeout field. Can you elaborate a bit more?

Updated to use existing per listener timeout.

salmanap · 2026-03-08T20:47:45Z

config/plano_config_schema.yaml

        type: boolean
      use_agent_orchestrator:
        type: boolean
+      upstream_timeout_ms:


Same as above. I don't think we need this field, especially if we already support a timeout field for model_listener objects. Please review more carefully

salmanap · 2026-03-08T20:48:55Z

crates/prompt_gateway/src/http_context.rs

As mentioned over the zoom call - we don't need any changes to the prompt_gateway side of things. The issue talked about how the llm_gateway was the one timing out and the developer may have had a tool call scenario that could have taken longer.

adilhafeez · 2026-03-09T23:50:42Z

So there are at multiple timeouts we are talking here,

Connection timeout — Envoy cluster connect_timeout (5s default)
Route timeout — Envoy route timeout (inbound client request lifecycle)
WASM outbound call timeout — dispatch_http_call Duration + x-envoy-upstream-rq-timeout-ms
Brightstaff outbound call timeout — reqwest client timeout (currently none)

For default values we should use sensible defaults for connection and request timeouts. And a developer should be able to modify them using overrides section in config.yaml. And defaults should be defined centrally somewhere and let's discuss their values here.

For example here is what I think default should be,

connection_timeout: 1s
request_timeout: 120s

salmanap · 2026-03-10T00:14:38Z

So there are at multiple timeouts we are talking here,

Connection timeout — Envoy cluster connect_timeout (5s default)

Route timeout — Envoy route timeout (inbound client request lifecycle)

WASM outbound call timeout — dispatch_http_call Duration + x-envoy-upstream-rq-timeout-ms

Brightstaff outbound call timeout — reqwest client timeout (currently none)

For default values we should use sensible defaults for connection and request timeouts. And a developer should be able to modify them using overrides section in config.yaml. And defaults should be defined centrally somewhere and let's discuss their values here.

For example here is what I think default should be,

connection_timeout: 1s

request_timeout: 120s

I think we should only expose a single timeout field right now to the developer via the config and set sensible defaults for the rest. The one timeout field is request_timeout, and the rest are internal timeouts with sensible defaults. Note for arcfc.katanemo.dev we can't set a connection_timeout of 1s especially for non-US access to our hosted models. It must be higher.

salmanap · 2026-03-15T04:28:37Z

@syedhashmi when are we wrapping this up? We need to get this over the finish line please.

…override

…tener

fix: make upstream timeout configurable, default to 300s

0c7b999

syedhashmi requested a review from adilhafeez March 5, 2026 00:33

salmanap reviewed Mar 8, 2026

View reviewed changes

Syed Hashmi added 4 commits March 16, 2026 16:46

use per-listener timeout config instead of separate upstream_timeout …

3e18289

…override

remove upstream_timeout_ms from config, schema, and rust struct

125af57

revert prompt_gateway changes

95486db

fix: use prompt_gateway_listener.timeout for outbound_api_traffic lis…

8cb3471

…tener

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: make upstream timeout configurable, default to 300s#799

fix: make upstream timeout configurable, default to 300s#799
syedhashmi wants to merge 6 commits intomainfrom
syed/issue_787

syedhashmi commented Mar 5, 2026 •

edited by adilhafeez

Loading

Uh oh!

salmanap left a comment

Uh oh!

salmanap Mar 8, 2026

Uh oh!

syedhashmi Mar 16, 2026

Uh oh!

salmanap Mar 8, 2026

Uh oh!

syedhashmi Mar 16, 2026

Uh oh!

salmanap Mar 8, 2026

Uh oh!

adilhafeez commented Mar 9, 2026

Uh oh!

salmanap commented Mar 10, 2026

Uh oh!

salmanap commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

syedhashmi commented Mar 5, 2026 • edited by adilhafeez Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

salmanap left a comment

Choose a reason for hiding this comment

Uh oh!

salmanap Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

syedhashmi Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

salmanap Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

syedhashmi Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

salmanap Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

adilhafeez commented Mar 9, 2026

Uh oh!

salmanap commented Mar 10, 2026

Uh oh!

salmanap commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

syedhashmi commented Mar 5, 2026 •

edited by adilhafeez

Loading