Skip to content

fix(cells) Don't record proxied request failures towards circuit breaker#111639

Merged
markstory merged 5 commits intomasterfrom
chore-tune-gateway-circuitbreaker
Mar 30, 2026
Merged

fix(cells) Don't record proxied request failures towards circuit breaker#111639
markstory merged 5 commits intomasterfrom
chore-tune-gateway-circuitbreaker

Conversation

@markstory
Copy link
Copy Markdown
Member

We don't want to trip the breaker when the proxied requests fail with 500s. We're mostly interested in preventing resource exhaustion from slow requests/connection errors here.

Refs INFRENG-275

We don't want to trip the breaker when the proxied requests fail with
500s. We're mostly interested in preventing resource exhaustion from
slow requests/connection errors here.

Refs INFRENG-275
@markstory markstory requested a review from a team as a code owner March 26, 2026 16:21
@linear-code
Copy link
Copy Markdown

linear-code bot commented Mar 26, 2026

@markstory markstory requested a review from a team March 26, 2026 16:21
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 26, 2026
Comment thread src/sentry/hybridcloud/apigateway/proxy.py Outdated
Comment on lines +206 to +208
except Exception:
logger.exception("Failed to record circuitbreaker failure")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the try/except seems like it's not needed by all users of the circuit breaker? it seems like it's already logged and exceptions handled properly in the record_error() function, so users of the function don't have to worry about it

Comment on lines 211 to 212
if resp.status_code >= 500:
metrics.incr("apigateway.proxy.request_failed", tags=metric_tags)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather keep >= 502

Suggested change
if resp.status_code >= 500:
metrics.incr("apigateway.proxy.request_failed", tags=metric_tags)
if resp.status_code >= 502 and circuit_breaker is not None:
metrics.incr("apigateway.proxy.request_failed", tags=metric_tags)
circuit_breaker.record_error()

These errors indicate failures from downstream loadbalancers that
indicate that the downstream application is gone/timing out.
Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread tests/sentry/hybridcloud/apigateway/test_proxy.py
except Exception:
logger.exception("Failed to record circuitbreaker failure")
if circuit_breaker is not None:
circuit_breaker.record_error()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed try/except allows record_error to suppress original exceptions

Medium Severity

The try/except around circuit_breaker.record_error() was removed based on the assumption that record_error() handles its own exceptions internally. However, record_error() calls self.limiter.use_quotas() and self.limiter.check_within_quotas() (via _should_trip) — both unprotected Redis operations that can throw. If Redis is flaky, an exception from record_error() in the Timeout handler prevents RequestTimeout() from being raised, and in the ConnectionError handler it replaces the original ConnectionError with an unrelated Redis exception.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The record_error() method traps errors.

@markstory markstory merged commit 5a5985a into master Mar 30, 2026
63 of 64 checks passed
@markstory markstory deleted the chore-tune-gateway-circuitbreaker branch March 30, 2026 14:27
@github-actions github-actions bot locked and limited conversation to collaborators Apr 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants