- Processes stuck for 40+ hours - Claude processes had no output for 140,000+ seconds
- Timeout not enforced -
SUBPROCESS_TIMEOUTconstant was defined but never used - No auto-resume on stuck processes - Script didn't detect and restart stuck sessions
- No context compaction handling - Didn't detect when Claude hit session limits
- No auto-continue mode - Never switched to
-cflag to resume sessions
def heartbeat_logger():
# Check for timeout - kill if stuck too long
if elapsed > SUBPROCESS_TIMEOUT:
log(f"[TIMEOUT] No output for {int(elapsed)}s (timeout: {SUBPROCESS_TIMEOUT}s)")
log(f"[TIMEOUT] Killing stuck subprocess PID {proc[0].pid}...")
proc[0].kill()Now the heartbeat thread actually kills stuck processes after 1 hour of no output.
def check_context_compaction(output):
"""Check if output indicates context window compaction or session limit."""
patterns = [
r"context.*window.*full",
r"compacting.*context",
r"session.*limit.*reached",
r"message.*history.*truncated",
r"conversation.*too.*long",
]Detects when Claude hits session limits and automatically switches to continue mode.
def check_stuck_or_timeout(ret_code, output):
"""Check if Claude got stuck or timed out."""
# Exit code -9 means killed by SIGKILL (our timeout killer)
if ret_code == -9 or ret_code == 137:
return TrueDetects when process was killed by timeout and automatically restarts with -c.
# After first successful run, always use continue flag
if not current_continue:
log("[SESSION] Switching to continue mode (-c) for subsequent runs...")
current_continue = TrueAfter the first run completes, all subsequent runs use -c to continue the session.
if ret_code != 0:
log(f"[ERROR] Will retry with continue flag (-c) in case session needs resuming...")
current_continue = True
run_number -= 1 # Retry with continue flag
time.sleep(10)
continueAny error now triggers a retry with continue mode after 10 seconds.
The script runs in foreground mode by default when called with --fg, but when you run it WITHOUT --fg, it:
- Spawns a background process with
--fg - Detaches it from your terminal
- Redirects stdout/stderr to the log file
- Returns immediately so you can use your terminal
The --fg in your ps output is normal and expected - it means the background wrapper is working correctly.
./claude-auto-resume "Implement all features in plan/ directory"This will:
- Start in background
- Return immediately
- Keep running forever
- Log to
.claude-auto-resume/session.log
./claude-auto-resume --statustail -f .claude-auto-resume/session.log./claude-auto-resume --stop./claude-auto-resume --force-stopRun this to test:
# Clean up old sessions
./claude-auto-resume --force-stop
# Start fresh session
./claude-auto-resume "Implement all features in plan/ directory"
# Follow logs in another terminal
tail -f .claude-auto-resume/session.log- Timeout enforcement: Kills stuck processes after 1 hour
- Auto-continue: Switches to
-cflag after first run - Context detection: Detects session limits and resumes
- Error recovery: Retries with continue flag on any error
- Stuck detection: Detects killed processes and restarts
- First run: Uses original prompt
- Subsequent runs: Uses
-cto continue - On timeout: Kills process, restarts with
-c - On error: Waits 10s, restarts with
-c - On context limit: Immediately restarts with
-c - On rate limit: Waits until reset, then continues
The script will never get stuck for 2 days again!