bug: Safeoutputs MCP sessions expire during long-running agent tasks (~30 min idle timeout)

## Summary

The MCP gateway drops safeoutputs sessions after approximately 30 minutes of inactivity. When an agent runs a long CPU-bound task (e.g., ML training, large builds) inside a shell tool call, no MCP requests are made during that period. The session expires server-side and becomes **unrecoverable** — all subsequent safeoutputs calls fail with `session not found`, and the agent has no way to deliver its results.

This is a **correctness issue**: safeoutputs is the only channel for the agent to report results, and it silently expires during the work the agent was asked to do.

## Upstream issue

github/gh-aw#23153 — two independent reports:
- [`dsyme/fv-squad`](https://github.com/dsyme/fv-squad/actions/runs/23607702532/job/68754992942) — 45-minute job, session expired
- [`githubnext/autoresearch_local`](https://github.com/githubnext/autoresearch_local/actions/runs/23924939318/job/69779658230) — 30-minute training run, session expired

## Reproduction timeline (autoresearch_local)

| Time (UTC) | Event |
|---|---|
| 22:29:07 | safeoutputs MCP server started (v0.2.9), session established |
| 22:29:32 | First `initialize` + `tools/list` calls succeed |
| 22:50–23:00 | Agent runs ML training (~30 min, no MCP calls) |
| 23:00:34 | Training completes successfully |
| 23:02:21 | **First safeoutputs failure**: `session not found` |
| 23:02–23:13 | All subsequent calls fail — noop, create_pull_request, push_repo_memory, add_comment, missing_tool |
| 23:13:48 | Agent gives up: "safeoutputs MCP session is permanently expired" |

## Error

All calls return the same error:

```
✗ noop (MCP: safeoutputs)
  └ MCP server 'safeoutputs': Error: Streamable HTTP error: Error POSTing to endpoint: session not found

✗ create_pull_request (MCP: safeoutputs)
  └ MCP server 'safeoutputs': Error: Streamable HTTP error: Error POSTing to endpoint: session not found
```

## Impact

- **Zero safe outputs completed** — no PR, no comments, no repo-memory
- Training succeeded (val_bpb 2.236 → 2.107) but results were lost
- Agent wasted ~11 minutes retrying with sleep waits before giving up
- Client-side workarounds (keepalive prompts) don't help because the agent can't send MCP calls while blocked on a long shell execution

## Proposed fixes

Any of these would resolve the issue:

1. **Remove or significantly extend session timeout for safeoutputs** — these sessions should live for the duration of the workflow (up to 6 hours for autoloop). A 30-minute idle timeout is incompatible with long-running tasks.

2. **Automatic keepalive from the gateway side** — the gateway could ping/refresh sessions internally rather than relying on client activity.

3. **Transparent session reconnect** — allow the client to re-establish a session when it receives `session not found`, without requiring manual intervention from the agent.

## References

- Upstream issue: github/gh-aw#23153
- [Comment with detailed analysis](https://github.com/github/gh-aw/issues/23153#issuecomment-4181057579) by @insop
- Client-side keepalive attempt (ineffective): https://github.com/githubnext/autoresearch_local/commit/b856479

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Safeoutputs MCP sessions expire during long-running agent tasks (~30 min idle timeout) #3078

Summary

Upstream issue

Reproduction timeline (autoresearch_local)

Error

Impact

Proposed fixes

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Time (UTC)	Event
22:29:07	safeoutputs MCP server started (v0.2.9), session established
22:29:32	First `initialize` + `tools/list` calls succeed
22:50–23:00	Agent runs ML training (~30 min, no MCP calls)
23:00:34	Training completes successfully
23:02:21	First safeoutputs failure: `session not found`
23:02–23:13	All subsequent calls fail — noop, create_pull_request, push_repo_memory, add_comment, missing_tool
23:13:48	Agent gives up: "safeoutputs MCP session is permanently expired"

bug: Safeoutputs MCP sessions expire during long-running agent tasks (~30 min idle timeout) #3078

Description

Summary

Upstream issue

Reproduction timeline (autoresearch_local)

Error

Impact

Proposed fixes

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions