fix(errors): enhance retry logic for ProviderError#98
fix(errors): enhance retry logic for ProviderError#98yuguorui wants to merge 1 commit intocharmbracelet:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR enhances the retry logic for ProviderError to align with OpenAI's official SDK by adding support for retrying connection errors and server internal errors (5xx status codes). The changes make the retry behavior more resilient to transient network issues and server-side failures.
Key Changes:
- Added retry support for connection errors (specifically
io.ErrUnexpectedEOF) - Extended retry logic to include all server internal errors (status codes ≥ 500)
- Added reference documentation linking to OpenAI's SDK implementation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Align part of the retry logic to the OpenAI offical SDK. Signed-off-by: yuguorui <yuguorui@pku.edu.cn>
| return true | ||
| } | ||
|
|
||
| return m.StatusCode == http.StatusRequestTimeout || m.StatusCode == http.StatusConflict || m.StatusCode == http.StatusTooManyRequests || m.StatusCode >= http.StatusInternalServerError |
There was a problem hiding this comment.
I'm not sure if retrying for 500 is a good idea. Other providers might behave differently than OpenAI.
Thoughts @kujtimiihoxha?
There was a problem hiding this comment.
I think its fine to retry on 500 actually, should eliminate some random internal errors I see sometimes.
There was a problem hiding this comment.
Yes, especially third-party LLM providers still occasionally return 5XX errors, and when this happens, the impact on the application is really frustrating. I think a relatively "polite" retry strategy (e.g., exponential backoff) is acceptable.
Align part of the retry logic to the OpenAI offical SDK.
CONTRIBUTING.md.