Skip to content

Fix session stuck in Stopping state when interrupt hangs#334

Open
brendanlong wants to merge 2 commits intomainfrom
claude/39ef0903-ec28-4acf-9882-70febe888bbd
Open

Fix session stuck in Stopping state when interrupt hangs#334
brendanlong wants to merge 2 commits intomainfrom
claude/39ef0903-ec28-4acf-9882-70febe888bbd

Conversation

@brendanlong
Copy link
Copy Markdown
Owner

Summary

  • Fixes sessions getting permanently stuck in "Stopping..." state when the SDK's interrupt() or close() calls hang
  • The stuck session cfe89d5c-9ebf-4240-bf7c-7b6ae2ef6da9 was manually fixed in the prod DB
  • Adds a 5-second timeout to interruptClaude() with fallback to force-clean the in-memory state
  • Adds onError handlers to the client-side stop/interrupt mutations so the UI recovers by refetching state instead of showing "Stopping..." forever

Root cause

The SDK's interrupt() can hang indefinitely if the agent is already dead or unresponsive. Since claude.interrupt awaits this call, the HTTP request never completes, and the client's mutation stays in isPending state forever.

Test plan

  • All 369 existing tests pass
  • TypeScript compiles cleanly
  • Test stopping a session that has no active agent (should complete immediately)
  • Test stopping a session with an active agent (should complete within 5 seconds even if SDK hangs)
  • Test that UI recovers from a failed stop attempt (should refetch and show actual state)

🤖 Generated with Claude Code

The SDK's interrupt() and close() calls can hang indefinitely if the
agent process is already dead or unresponsive. This causes the stop
mutation to never complete, leaving the UI stuck on "Stopping...".

Changes:
- Add 5-second timeout to interruptClaude() so it doesn't hang forever
- Add forceCleanSession() to clean up in-memory state when SDK calls
  fail or time out
- Add onError handlers to stop/interrupt mutations on the client so
  the UI recovers by refetching actual state instead of staying stuck

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces timeout handling and robust state recovery for Claude session interruptions. Key changes include adding onError handlers to frontend hooks to refetch state on failure, and implementing a withTimeout mechanism in the backend to force-clean session state if the SDK's interrupt call hangs or fails. Review feedback identifies a logic bug in the withTimeout implementation where the success of a void promise is indistinguishable from a timeout, and notes a potential resource leak due to un-cleared timers.

Comment on lines +609 to +614
function withTimeout<T>(promise: Promise<T>, ms: number): Promise<T | undefined> {
return Promise.race([
promise,
new Promise<undefined>((resolve) => setTimeout(() => resolve(undefined), ms)),
]);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of withTimeout has two issues:

  1. Ambiguity: It resolves to undefined both when the timeout occurs and when the input promise resolves to undefined. Since state.currentQuery.interrupt() returns Promise<void>, it always resolves to undefined on success, making the timeout check at line 631 always true.
  2. Resource Leak: The setTimeout timer is not cleared if the promise resolves before the timeout, which can lead to an accumulation of active timers in a long-running process.

I suggest using a wrapper object to distinguish between success and timeout, and ensuring the timer is cleared.

function withTimeout<T>(promise: Promise<T>, ms: number): Promise<{ ok: true; value: T } | { ok: false }> {
  let timeoutId: ReturnType<typeof setTimeout>;
  const timeoutPromise = new Promise<{ ok: false }>((resolve) => {
    timeoutId = setTimeout(() => resolve({ ok: false }), ms);
  });
  return Promise.race([
    promise.then((value) => ({ ok: true as const, value })),
    timeoutPromise,
  ]).finally(() => clearTimeout(timeoutId));
}

Comment on lines +630 to +631
const result = await withTimeout(state.currentQuery.interrupt(), INTERRUPT_TIMEOUT_MS);
if (result === undefined) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update this check to align with the improved withTimeout implementation that uses a wrapper object to distinguish between a successful interruption and a timeout.

Suggested change
const result = await withTimeout(state.currentQuery.interrupt(), INTERRUPT_TIMEOUT_MS);
if (result === undefined) {
const result = await withTimeout(state.currentQuery.interrupt(), INTERRUPT_TIMEOUT_MS);
if (!result.ok) {

When interrupt() times out, forceCleanSession was only cleaning up
in-memory state but leaving the CLI subprocess running. Now it calls
close() first, which triggers the SDK's subprocess kill chain
(SIGTERM immediately, SIGKILL after 5s).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants