Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds retry logic to handle transient connection failures when communicating with a guest via Unix socket. The function now automatically retries on ConnectionResetError exceptions, improving resilience.
- Added retry mechanism with configurable max retries and delay parameters
- Improved socket resource management with proper null checks
- Enhanced error handling to distinguish between recoverable (ConnectionResetError) and non-recoverable (OSError) errors
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except ConnectionResetError as e: | ||
| print(f"Connection reset by peer (attempt {attempt+1}/{max_retries}): {e}", file=sys.stderr) | ||
| attempt += 1 | ||
| time.sleep(retry_delay) |
There was a problem hiding this comment.
The retry logic only increments attempt for ConnectionResetError, but the while loop condition checks attempt < max_retries at the start of each iteration. This means after the last retry fails, the loop will execute one more time unnecessarily. Consider moving the increment to the beginning of the loop or adjusting the loop condition.
| except SystemExit as e: | ||
| sys.exit(e.code) |
There was a problem hiding this comment.
Catching and re-raising SystemExit inside the retry loop causes the process to exit on the first retry, preventing any subsequent retry attempts. The SystemExit exception should be re-raised outside the retry loop or handled differently to allow retries to occur.
| except SystemExit as e: | |
| sys.exit(e.code) | |
| # Let SystemExit propagate naturally; do not catch it here. |
| else: | ||
| print("Failed to connect after multiple attempts.", file=sys.stderr) | ||
| sys.exit(1) |
There was a problem hiding this comment.
The while-else clause will execute even when the loop is exited via break statements (lines 43, 51), not just when max_retries is exceeded. This means success cases that break out of the loop will incorrectly trigger the failure message and exit. The else clause should only execute when retries are exhausted.
| import time | ||
|
|
||
|
|
||
| def run_guest(unix_socket, port, command, use_stdio=True, max_retries=3, retry_delay=1): |
There was a problem hiding this comment.
Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This PR fixes #16.
It also handles ConnectionResetError by trying again if there is a connection reset.
This behavior may not be ideal if it's possible for us to re-run commands.
We might consider a lesser behavior where we do slightly different things if the transaction breaks at different points in the transaction. @zestrada