-
Notifications
You must be signed in to change notification settings - Fork 112
Description
Problem
When the AgentCore Runtime is deployed in VPC mode, users experience a ~20 second delay after the agent finishes generating its response. The UI shows "Thinking..." with the chat input blocked, even though the full response text is already visible on screen.
This also occurs in PUBLIC mode but is much less noticeable due to lower latency on direct internet calls.
Root Cause
The frontend ChatInterface.tsx sets isLoading(false) in a finally block that only runs after await client.invoke(...) fully resolves — i.e., when the HTTP stream closes. The backend keeps the stream open after the last text chunk while it performs post-response work:
- Memory save (conversation history persistence)
- MCP client teardown (Gateway connection cleanup)
These operations go through VPC endpoints (PrivateLink), which add latency per call. The cumulative effect is a noticeable delay between "response visible" and "stream closed."
Expected Behavior
The UI should unblock the chat input as soon as the agent's response text is fully streamed, regardless of backend cleanup work.
Suggested Fix
Either:
- Backend: Close the HTTP stream immediately after the response is complete, then perform memory save and MCP cleanup asynchronously (fire-and-forget or background task)
- Frontend: Detect the end of the response content (e.g., a sentinel event or stop_reason) and set isLoading(false) before the stream fully closes
Reproduction
- Deploy FAST with network_mode: VPC in config.yaml
- Send any message to the agent
- Observe the response text appears quickly, but "Thinking..." persists for ~15-20 seconds after
Environment
- Discovered during VPC deployment testing (feat/vpc_deployment branch merged with main)
- Affects both tool-using and non-tool responses
- More pronounced in VPC mode due to PrivateLink latency, but the underlying issue exists in PUBLIC mode too