fix(cells) Don't record proxied request failures towards circuit breaker#111639
fix(cells) Don't record proxied request failures towards circuit breaker#111639
Conversation
We don't want to trip the breaker when the proxied requests fail with 500s. We're mostly interested in preventing resource exhaustion from slow requests/connection errors here. Refs INFRENG-275
| except Exception: | ||
| logger.exception("Failed to record circuitbreaker failure") | ||
|
|
There was a problem hiding this comment.
the try/except seems like it's not needed by all users of the circuit breaker? it seems like it's already logged and exceptions handled properly in the record_error() function, so users of the function don't have to worry about it
| if resp.status_code >= 500: | ||
| metrics.incr("apigateway.proxy.request_failed", tags=metric_tags) |
There was a problem hiding this comment.
I would rather keep >= 502
| if resp.status_code >= 500: | |
| metrics.incr("apigateway.proxy.request_failed", tags=metric_tags) | |
| if resp.status_code >= 502 and circuit_breaker is not None: | |
| metrics.incr("apigateway.proxy.request_failed", tags=metric_tags) | |
| circuit_breaker.record_error() |
These errors indicate failures from downstream loadbalancers that indicate that the downstream application is gone/timing out.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| except Exception: | ||
| logger.exception("Failed to record circuitbreaker failure") | ||
| if circuit_breaker is not None: | ||
| circuit_breaker.record_error() |
There was a problem hiding this comment.
Removed try/except allows record_error to suppress original exceptions
Medium Severity
The try/except around circuit_breaker.record_error() was removed based on the assumption that record_error() handles its own exceptions internally. However, record_error() calls self.limiter.use_quotas() and self.limiter.check_within_quotas() (via _should_trip) — both unprotected Redis operations that can throw. If Redis is flaky, an exception from record_error() in the Timeout handler prevents RequestTimeout() from being raised, and in the ConnectionError handler it replaces the original ConnectionError with an unrelated Redis exception.
Additional Locations (1)
There was a problem hiding this comment.
The record_error() method traps errors.


We don't want to trip the breaker when the proxied requests fail with 500s. We're mostly interested in preventing resource exhaustion from slow requests/connection errors here.
Refs INFRENG-275