fix: make upstream timeout configurable, default to 300s#799
fix: make upstream timeout configurable, default to 300s#799syedhashmi wants to merge 6 commits intomainfrom
Conversation
…787) Hardcoded 30s timeouts in envoy config caused premature termination of long-running LLM requests (tool-use, agentic workflows). Make timeouts configurable via upstream_timeout_ms override and default to 300s.
salmanap
left a comment
There was a problem hiding this comment.
Some comments on the PR. Also, I think we would need a test case here - else its hard to tell if the change in timeout is actually working. Lastly, I don't follow from the issues request how the user simulated a timeout. From the looks of that code, I see that he is adding a time.sleep on local code, which has no implication on the response from an upstream LLM
cli/planoai/config_generator.py
Outdated
| "upstream_tls_ca_path", "/etc/ssl/certs/ca-certificates.crt" | ||
| ) | ||
|
|
||
| upstream_timeout_ms = overrides.get("upstream_timeout_ms") |
There was a problem hiding this comment.
not sure why we we have an upstream_timeout_rs field, when the model_listener object already has a timeout field. Can you elaborate a bit more?
There was a problem hiding this comment.
Updated to use existing per listener timeout.
config/plano_config_schema.yaml
Outdated
| type: boolean | ||
| use_agent_orchestrator: | ||
| type: boolean | ||
| upstream_timeout_ms: |
There was a problem hiding this comment.
Same as above. I don't think we need this field, especially if we already support a timeout field for model_listener objects. Please review more carefully
There was a problem hiding this comment.
As mentioned over the zoom call - we don't need any changes to the prompt_gateway side of things. The issue talked about how the llm_gateway was the one timing out and the developer may have had a tool call scenario that could have taken longer.
|
So there are at multiple timeouts we are talking here,
For default values we should use sensible defaults for connection and request timeouts. And a developer should be able to modify them using overrides section in config.yaml. And defaults should be defined centrally somewhere and let's discuss their values here. For example here is what I think default should be,
|
I think we should only expose a single timeout field right now to the developer via the config and set sensible defaults for the rest. The one timeout field is request_timeout, and the rest are internal timeouts with sensible defaults. Note for arcfc.katanemo.dev we can't set a connection_timeout of 1s especially for non-US access to our hosted models. It must be higher. |
|
@syedhashmi when are we wrapping this up? We need to get this over the finish line please. |
Placeholder PR for Adil to review the timeout related changes.
fixes #787