-
Notifications
You must be signed in to change notification settings - Fork 964
Enhance Ralph loop with retries and fix Claude Code hanging #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Greptile OverviewGreptile SummaryThis PR addresses the Claude Code hanging issue by implementing a stream-based monitoring approach with automatic hang detection and retry logic for transient errors. Key Changes:
Issues Found:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Ralph as ralph.sh
participant RetryFn as run_with_retry
participant StreamFn as run_claude_with_stream
participant Claude as claude process
participant Monitor as Output Monitor
participant Killer as Timeout Killer
Ralph->>RetryFn: Execute iteration
RetryFn->>RetryFn: Set attempt = 1
alt Tool is Claude
RetryFn->>StreamFn: Call run_claude_with_stream
StreamFn->>Claude: Spawn with stream-json output
StreamFn->>Monitor: Start monitoring loop
loop While process alive
Monitor->>Monitor: Check kill -0
Monitor->>StreamFn: Grep for type:result
alt Result detected
Monitor->>Monitor: Set result_received=true
Monitor->>Killer: Spawn timeout killer
Monitor-->>StreamFn: Break loop
end
end
alt Claude exits cleanly
Claude-->>StreamFn: Exit normally
StreamFn->>Killer: Kill timeout process
else Claude hangs
Killer->>Claude: Kill after timeout
end
StreamFn->>StreamFn: Extract result from stream
StreamFn-->>RetryFn: Return output
else Tool is Amp
RetryFn->>RetryFn: Run amp directly
end
RetryFn->>RetryFn: Check is_retryable_error
alt Retryable error detected
RetryFn->>RetryFn: Increment attempt and backoff
RetryFn->>RetryFn: Retry up to MAX_RETRIES
else Success or non-retryable
RetryFn-->>Ralph: Return output
end
Ralph->>Ralph: Check for COMPLETE signal
alt Complete
Ralph->>Ralph: Exit success
else More work
Ralph->>Ralph: Next iteration
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 3 comments
|
|
||
| # Temp file for stream output (cleaned up on exit) | ||
| STREAM_OUTPUT=$(mktemp) | ||
| trap "rm -f $STREAM_OUTPUT" EXIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap $STREAM_OUTPUT in quotes to handle paths with spaces safely
| trap "rm -f $STREAM_OUTPUT" EXIT | |
| trap "rm -f \"$STREAM_OUTPUT\"" EXIT |
Prompt To Fix With AI
This is a comment left during a code review.
Path: ralph.sh
Line: 105:105
Comment:
Wrap `$STREAM_OUTPUT` in quotes to handle paths with spaces safely
```suggestion
trap "rm -f \"$STREAM_OUTPUT\"" EXIT
```
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| # Monitor the output file for the result message | ||
| # The key insight: "type":"result" is emitted BEFORE the hang occurs | ||
| while kill -0 $claude_pid 2>/dev/null; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition: process could exit between kill -0 check and grep, causing the result check to be skipped. The loop will exit if the process dies naturally before detecting the result, potentially missing successful completions.
Prompt To Fix With AI
This is a comment left during a code review.
Path: ralph.sh
Line: 136:136
Comment:
Race condition: process could exit between `kill -0` check and grep, causing the result check to be skipped. The loop will exit if the process dies naturally before detecting the result, potentially missing successful completions.
How can I resolve this? If you propose a fix, please make it concise.| local attempt=1 | ||
| local delay=$INITIAL_RETRY_DELAY | ||
| local output="" | ||
| local exit_code=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable exit_code is set but never checked - the retry logic only uses pattern matching via is_retryable_error, ignoring non-zero exit codes from successful runs with warnings
Prompt To Fix With AI
This is a comment left during a code review.
Path: ralph.sh
Line: 184:184
Comment:
Variable `exit_code` is set but never checked - the retry logic only uses pattern matching via `is_retryable_error`, ignoring non-zero exit codes from successful runs with warnings
How can I resolve this? If you propose a fix, please make it concise.
I faced the error described in #32 multiple times. So, I combined 2 solutions which are:
This fix worked for me, and allowed Ralph to continue working with Claude Code for 26 iterations until now.