fix: prevent safeoutputs session expiry during long-running agent tasks#3079
fix: prevent safeoutputs session expiry during long-running agent tasks#3079
Conversation
…end connection pool idle timeout Agent-Logs-Url: https://github.com/github/gh-aw-mcpg/sessions/e656bbfd-cb96-4f37-b453-1acce2b238df Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
…vs HTTP session management Agent-Logs-Url: https://github.com/github/gh-aw-mcpg/sessions/e656bbfd-cb96-4f37-b453-1acce2b238df Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
Add keepalive_interval field to GatewayConfig (seconds). Default 1500 (25 min) preserves current behavior. Set to -1 to disable keepalive pings entirely when higher-level timeouts manage session lifecycle. Thread the configurable duration through NewHTTPConnection → trySDKTransport → newMCPClient instead of hardcoding DefaultHTTPKeepaliveInterval. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent MCP (notably safeoutputs) sessions from expiring during long-running, CPU-bound agent tasks by introducing periodic HTTP keepalive pings and by extending the STDIO connection pool idle eviction timeout.
Changes:
- Add HTTP keepalive support by threading a
keepAlive time.Durationthrough HTTP connection creation into the MCP SDK client options, and preserving it across reconnects. - Extend the connection pool default idle timeout from 30 minutes to 6 hours to avoid prematurely evicting long-idle STDIO backend connections.
- Update affected call sites and tests to match the new
NewHTTPConnection(..., keepAlive)API.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
internal/mcp/http_transport.go |
Introduces default keepalive interval constant and threads keepalive into SDK client creation and HTTP connection construction. |
internal/mcp/connection.go |
Adds keepalive param to NewHTTPConnection and ensures reconnect reuses the stored keepalive interval; disables keepalive for STDIO. |
internal/launcher/launcher.go |
Passes gateway-configured HTTP keepalive interval when creating HTTP connections. |
internal/config/config_core.go |
Adds gateway keepalive configuration + defaulting and exposes it as time.Duration. |
internal/launcher/connection_pool.go |
Extends default pool idle timeout to 6 hours. |
internal/mcp/*_test.go, test/integration/http_error_test.go |
Updates tests to match the new HTTP connection signature and adds coverage for keepalive-related helpers/constants. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/mcp/http_transport.go
Outdated
| sessionID = session.ID() | ||
| } | ||
| logHTTP.Printf("Creating HTTP connection: serverID=%s, url=%s, transport=%s, headers=%d, sessionID=%s", serverID, url, transportType, len(headers), sessionID) | ||
| logHTTP.Printf("Creating HTTP connection: serverID=%s, url=%s, transport=%s, headers=%d, sessionID=%s, keepAlive=%v", serverID, url, transportType, len(headers), sessionID, keepAlive) |
There was a problem hiding this comment.
sessionID is logged in full here. Backend session IDs can be sensitive and the codebase elsewhere truncates session IDs before logging (e.g., internal/auth.TruncateSessionID used in internal/server/*). Please truncate/redact the session ID (or omit it entirely) in this log line to avoid leaking secrets in logs.
| logHTTP.Printf("Creating HTTP connection: serverID=%s, url=%s, transport=%s, headers=%d, sessionID=%s, keepAlive=%v", serverID, url, transportType, len(headers), sessionID, keepAlive) | |
| logHTTP.Printf("Creating HTTP connection: serverID=%s, url=%s, transport=%s, headers=%d, keepAlive=%v", serverID, url, transportType, len(headers), keepAlive) |
|
|
||
| // HTTPKeepaliveInterval returns the keepalive interval as a time.Duration. | ||
| // A negative KeepaliveInterval disables keepalive (returns 0). | ||
| func (g *GatewayConfig) HTTPKeepaliveInterval() time.Duration { | ||
| if g.KeepaliveInterval < 0 { | ||
| return 0 | ||
| } |
There was a problem hiding this comment.
KeepaliveInterval is added to GatewayConfig, but it isn’t wired into the stdin JSON config path: StdinGatewayConfig/the embedded JSON schema don’t include a keepalive field, and convertStdinConfig doesn’t call applyGatewayDefaults when stdinCfg.Gateway is present. As a result, for --config-stdin runs (which are schema-validated), keepalive will remain 0 and be effectively disabled, undermining the main fix. Please extend the stdin schema + StdinGatewayConfig to accept a keepalive value (likely keepaliveInterval in camelCase), propagate it into GatewayConfig, and ensure defaults are applied for the stdin conversion path too.
| // HTTPKeepaliveInterval returns the keepalive interval as a time.Duration. | |
| // A negative KeepaliveInterval disables keepalive (returns 0). | |
| func (g *GatewayConfig) HTTPKeepaliveInterval() time.Duration { | |
| if g.KeepaliveInterval < 0 { | |
| return 0 | |
| } | |
| // defaultHTTPKeepaliveIntervalSeconds is used when KeepaliveInterval is left unset. | |
| // Negative values still explicitly disable keepalive. | |
| const defaultHTTPKeepaliveIntervalSeconds = 30 | |
| // HTTPKeepaliveInterval returns the keepalive interval as a time.Duration. | |
| // A negative KeepaliveInterval disables keepalive (returns 0). | |
| // A zero KeepaliveInterval is treated as "unset" and falls back to the default. | |
| func (g *GatewayConfig) HTTPKeepaliveInterval() time.Duration { | |
| if g == nil { | |
| return defaultHTTPKeepaliveIntervalSeconds * time.Second | |
| } | |
| if g.KeepaliveInterval < 0 { | |
| return 0 | |
| } | |
| if g.KeepaliveInterval == 0 { | |
| return defaultHTTPKeepaliveIntervalSeconds * time.Second | |
| } |
| // DefaultIdleTimeout is the maximum duration a STDIO-backend connection can remain unused | ||
| // before being removed from the session pool. Set to 6 hours to accommodate long-running | ||
| // workflow tasks (e.g. ML training, large builds) that may not make MCP calls for extended | ||
| // periods. Note: this is distinct from HTTP backend keepalive (DefaultHTTPKeepaliveInterval) | ||
| // which keeps the remote session alive on the HTTP server side; STDIO connections run as local | ||
| // child processes whose sessions are bounded only by this pool eviction window. | ||
| DefaultIdleTimeout = 6 * time.Hour | ||
| DefaultCleanupInterval = 5 * time.Minute |
There was a problem hiding this comment.
Raising DefaultIdleTimeout to 6h increases the time STDIO backend processes can remain resident when idle. In this pool’s cleanup path, removed entries are not actually closed (there is no call to Connection.Close()), so longer timeouts can prolong leaked resources if callers forget to close or if pool eviction is relied on for cleanup. Consider closing the underlying mcp.Connection when evicting (and/or confirming callers always close) so the longer timeout doesn’t translate into longer-lived orphaned processes.
internal/mcp/http_transport.go
Outdated
| // DefaultHTTPKeepaliveInterval is the default interval for sending keepalive pings to HTTP backends. | ||
| // This should be less than the backend's session idle timeout (typically 30 minutes for safeoutputs). | ||
| // A value of 25 minutes gives a 5-minute buffer before the session would expire. | ||
| const DefaultHTTPKeepaliveInterval = 25 * time.Minute |
There was a problem hiding this comment.
There are now two separate “default keepalive” constants: DefaultHTTPKeepaliveInterval here (a time.Duration) and DefaultKeepaliveInterval in internal/config/config_core.go (seconds). Keeping defaults in two places risks drift over time. Consider using a single source of truth for the default (e.g., define it once and derive the other representation from it) to avoid inconsistent behavior between config defaults and code defaults.
- Remove session ID from HTTP connection log line to avoid leaking sensitive data (use TruncateSessionID pattern) - Wire keepalive_interval into stdin JSON config path via StdinGatewayConfig.KeepaliveInterval and convertStdinConfig - Add nil guard to GatewayConfig.HTTPKeepaliveInterval() - Close connections during pool eviction instead of just marking state - Make Connection.Close() nil-safe for cancel function - Consolidate duplicate keepalive constants: remove DefaultHTTPKeepaliveInterval from mcp package, use single source of truth in config.DefaultKeepaliveInterval Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Note on
|
MCP sessions (notably
safeoutputs) expire after ~30 minutes of backend inactivity. When agents run long CPU-bound tasks (ML training, large builds), no MCP calls are made, the server-side session silently expires, and all subsequent calls fail withsession not found— making results undeliverable.Changes
HTTP backend keepalive (primary fix)
DefaultHTTPKeepaliveInterval = 25 * time.Minute— intentionally under the typical 30-min backend timeoutkeepAlive time.Durationparameter throughnewMCPClientinto the SDK's built-inClientOptions.KeepAlive, which automatically sends periodicpingrequests to keep the server-side session alivekeepAliveIntervalon theConnectionstruct soreconnectSDKTransportpreserves the same keepalive on reconnectkeepAlive=0— process lifecycle manages the session, no pings neededConnection pool idle timeout extension (secondary fix)
DefaultIdleTimeout:30m → 6h— aligns with the maximum autoloop workflow duration, preventing premature eviction of STDIO backend connections from the session pool during long quiet periodsWarning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
example.com/tmp/go-build1955081425/b336/launcher.test /tmp/go-build1955081425/b336/launcher.test -test.testlogfile=/tmp/go-build1955081425/b336/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 64/src/runtime/c-I AqUKQ403Y de/node/bin/as(dns block)/tmp/go-build216210123/b264/launcher.test /tmp/go-build216210123/b264/launcher.test -test.testlogfile=/tmp/go-build216210123/b264/testlog.txt -test.paniconexit0 -test.timeout=10m0s 5081�� 5081425/b189/_pkg_.a ache/go/1.25.8/x64/src/regexp/syntax/compile.go x_amd64/vet -p l -lang=go1.25 x_amd64/vet go_.�� 64/src/net SkhS/x3QNW0bkE1zmain 64/pkg/tool/linu-lang=go1.25 -I /tmp/go-build195-buildid -I 64/pkg/tool/linu-dwarf=false(dns block)/tmp/go-build3185549484/b001/launcher.test /tmp/go-build3185549484/b001/launcher.test -test.testlogfile=/tmp/go-build3185549484/b001/testlog.txt -test.paniconexit0 -test.timeout=10m0s --ve�� 359733afab535821 ache/go/1.25.8/x64/pkg/tool/linujson 4769656/b379/vet.cfg 524e3443524efd17git 5081425/b152/vetshow f33/log.json 841eeb0f198986d61e1/log.json -qE(dns block)invalid-host-that-does-not-exist-12345.com/tmp/go-build1955081425/b318/config.test /tmp/go-build1955081425/b318/config.test -test.testlogfile=/tmp/go-build1955081425/b318/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true -c=4 -nolocalimports -importcfg /tmp/go-build1955081425/b286/importcfg -pack /home/REDACTED/work/gh-aw-mcpg/gh-aw-mcpg/internal/oidc/provider.go 64/pkg/tool/linu/tmp/go-build1955081425/b124/vet.cfg --no�� go dedword.go 64/pkg/tool/linux_amd64/compile(dns block)/tmp/go-build1294769656/b322/config.test /tmp/go-build1294769656/b322/config.test -test.testlogfile=/tmp/go-build1294769656/b322/testlog.txt -test.paniconexit0 -test.timeout=10m0s --ctstate INVALID,NEW -j DROP --gdwarf-5 --64 -o x_amd64/compile(dns block)nonexistent.local/tmp/go-build1955081425/b336/launcher.test /tmp/go-build1955081425/b336/launcher.test -test.testlogfile=/tmp/go-build1955081425/b336/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 64/src/runtime/c-I AqUKQ403Y de/node/bin/as(dns block)/tmp/go-build216210123/b264/launcher.test /tmp/go-build216210123/b264/launcher.test -test.testlogfile=/tmp/go-build216210123/b264/testlog.txt -test.paniconexit0 -test.timeout=10m0s 5081�� 5081425/b189/_pkg_.a ache/go/1.25.8/x64/src/regexp/syntax/compile.go x_amd64/vet -p l -lang=go1.25 x_amd64/vet go_.�� 64/src/net SkhS/x3QNW0bkE1zmain 64/pkg/tool/linu-lang=go1.25 -I /tmp/go-build195-buildid -I 64/pkg/tool/linu-dwarf=false(dns block)/tmp/go-build3185549484/b001/launcher.test /tmp/go-build3185549484/b001/launcher.test -test.testlogfile=/tmp/go-build3185549484/b001/testlog.txt -test.paniconexit0 -test.timeout=10m0s --ve�� 359733afab535821 ache/go/1.25.8/x64/pkg/tool/linujson 4769656/b379/vet.cfg 524e3443524efd17git 5081425/b152/vetshow f33/log.json 841eeb0f198986d61e1/log.json -qE(dns block)slow.example.com/tmp/go-build1955081425/b336/launcher.test /tmp/go-build1955081425/b336/launcher.test -test.testlogfile=/tmp/go-build1955081425/b336/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 64/src/runtime/c-I AqUKQ403Y de/node/bin/as(dns block)/tmp/go-build216210123/b264/launcher.test /tmp/go-build216210123/b264/launcher.test -test.testlogfile=/tmp/go-build216210123/b264/testlog.txt -test.paniconexit0 -test.timeout=10m0s 5081�� 5081425/b189/_pkg_.a ache/go/1.25.8/x64/src/regexp/syntax/compile.go x_amd64/vet -p l -lang=go1.25 x_amd64/vet go_.�� 64/src/net SkhS/x3QNW0bkE1zmain 64/pkg/tool/linu-lang=go1.25 -I /tmp/go-build195-buildid -I 64/pkg/tool/linu-dwarf=false(dns block)/tmp/go-build3185549484/b001/launcher.test /tmp/go-build3185549484/b001/launcher.test -test.testlogfile=/tmp/go-build3185549484/b001/testlog.txt -test.paniconexit0 -test.timeout=10m0s --ve�� 359733afab535821 ache/go/1.25.8/x64/pkg/tool/linujson 4769656/b379/vet.cfg 524e3443524efd17git 5081425/b152/vetshow f33/log.json 841eeb0f198986d61e1/log.json -qE(dns block)this-host-does-not-exist-12345.com/tmp/go-build1955081425/b345/mcp.test /tmp/go-build1955081425/b345/mcp.test -test.testlogfile=/tmp/go-build1955081425/b345/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true --noprofile 64/src/runtime/c-ifaceassert x_amd64/vet(dns block)/tmp/go-build216210123/b252/mcp.test /tmp/go-build216210123/b252/mcp.test -test.testlogfile=/tmp/go-build216210123/b252/testlog.txt -test.paniconexit0 -test.timeout=10m0s 5081�� internal/chacha8rand ache/go/1.25.8/x64/src/runtime/cgo x_amd64/vet 5081425/b171/ /tmp/go-build195run(dns block)If you need me to access, download, or install something from one of these locations, you can either: