Skip to content

feat(hybridcloud): async apigateway#111307

Merged
gi0baro merged 16 commits intomasterfrom
gi0baro/async-apigateway
Mar 31, 2026
Merged

feat(hybridcloud): async apigateway#111307
gi0baro merged 16 commits intomasterfrom
gi0baro/async-apigateway

Conversation

@gi0baro
Copy link
Copy Markdown
Member

@gi0baro gi0baro commented Mar 23, 2026

This changes the apigateway proxy to be async, with the idea to serve the relevant deployment of control silo in ASGI rather than WSGI.

The rationale here is to avoid situations in which we exhaust the server's threadpool by just waiting for apigateway requests to complete, as we saw in INCs 2054/2056.

Note: the APIGateway changes are gated into a separated Python module, the async flow is enabled through SENTRY_ASYNC_APIGW environment variable. This allows us to control the rollout of the change in prod. Tests and local devserver are instead always using the new code.

Detailed changes:

  • Make APIGateway proxy async, switching inner client impl from requests to httpx
  • Change APIGateway middleware to work both in ASGI and WSGI contexts (with the latter using async_to_sync)
  • Update relevant tests interacting with APIGateway
  • Fix proxy acceptance test
  • Fix ORM calls in the custom SDK integration
  • Bypass ORM calls in SDK custom logging integration
  • Restore/adapt circuit brakers

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 23, 2026
@gi0baro gi0baro force-pushed the gi0baro/async-apigateway branch from 6a7fe9d to cd37f83 Compare March 23, 2026 17:00
Comment thread src/sentry/hybridcloud/apigateway/proxy.py Outdated
Comment thread src/sentry/objectstore/endpoints/organization.py Outdated
@gi0baro gi0baro changed the title feat: async apigateway feat(hybridcloud): async apigateway Mar 23, 2026
Comment thread src/sentry/hybridcloud/apigateway/middleware.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

Backend Test Failures

Failures on aa455fa in this run:

tests/sentry/objectstore/endpoints/test_organization.py::OrganizationObjectstoreEndpointWithControlSiloTest::test_roundtrip_compressedlog
tests/sentry/objectstore/endpoints/test_organization.py:291: in test_roundtrip_compressed
    response = self.client.post(
.venv/lib/python3.13/site-packages/django/test/client.py:1153: in post
    response = super().post(
.venv/lib/python3.13/site-packages/django/test/client.py:499: in post
    return self.generic(
.venv/lib/python3.13/site-packages/django/test/client.py:671: in generic
    return self.request(**r)
.venv/lib/python3.13/site-packages/django/test/client.py:1087: in request
    self.check_exception(response)
.venv/lib/python3.13/site-packages/django/test/client.py:802: in check_exception
    raise exc_value
.venv/lib/python3.13/site-packages/django/core/handlers/exception.py:55: in inner
    response = get_response(request)
.venv/lib/python3.13/site-packages/django/core/handlers/base.py:197: in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
.venv/lib/python3.13/site-packages/django/views/decorators/csrf.py:65: in _view_wrapper
    return view_func(request, *args, **kwargs)
.venv/lib/python3.13/site-packages/django/views/generic/base.py:105: in view
    return self.dispatch(request, *args, **kwargs)
src/sentry/silo/base.py:166: in override
    return handler(*args, **kwargs)
src/sentry/api/base.py:700: in handle
    raise self.AvailabilityError(message)
E   sentry.silo.base.SiloLimit.AvailabilityError: Received POST request at '/api/0/organizations/4557843054723088/objectstore/v1/objects/test/org=4557843054723088/' to server in CONTROL mode. This endpoint is available only in: REGION, MONOLITH
tests/sentry/objectstore/endpoints/test_organization.py::OrganizationObjectstoreEndpointWithControlSiloTest::test_full_cyclelog
tests/sentry/objectstore/endpoints/test_organization.py:220: in test_full_cycle
    response = self.client.post(
.venv/lib/python3.13/site-packages/django/test/client.py:1153: in post
    response = super().post(
.venv/lib/python3.13/site-packages/django/test/client.py:499: in post
    return self.generic(
.venv/lib/python3.13/site-packages/django/test/client.py:671: in generic
    return self.request(**r)
.venv/lib/python3.13/site-packages/django/test/client.py:1087: in request
    self.check_exception(response)
.venv/lib/python3.13/site-packages/django/test/client.py:802: in check_exception
    raise exc_value
.venv/lib/python3.13/site-packages/django/core/handlers/exception.py:55: in inner
    response = get_response(request)
.venv/lib/python3.13/site-packages/django/core/handlers/base.py:197: in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
.venv/lib/python3.13/site-packages/django/views/decorators/csrf.py:65: in _view_wrapper
    return view_func(request, *args, **kwargs)
.venv/lib/python3.13/site-packages/django/views/generic/base.py:105: in view
    return self.dispatch(request, *args, **kwargs)
src/sentry/silo/base.py:166: in override
    return handler(*args, **kwargs)
src/sentry/api/base.py:700: in handle
    raise self.AvailabilityError(message)
E   sentry.silo.base.SiloLimit.AvailabilityError: Received POST request at '/api/0/organizations/4557843053543440/objectstore/v1/objects/test/org=4557843053543440/' to server in CONTROL mode. This endpoint is available only in: REGION, MONOLITH
tests/sentry/middleware/test_proxy.py::FakedAPIProxyTest::test_through_api_gatewaylog
tests/sentry/middleware/test_proxy.py:57: in test_through_api_gateway
    with self.api_gateway_proxy_stubbed():
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/contextlib.py:141: in __enter__
    return next(self.gen)
src/sentry/testutils/cases.py:708: in api_gateway_proxy_stubbed
    with mock.patch(
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1495: in __enter__
    original, local = self.get_original()
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1465: in get_original
    raise AttributeError(
E   AttributeError: <module 'sentry.hybridcloud.apigateway.proxy' from '/home/runner/work/sentry/sentry/src/sentry/hybridcloud/apigateway/proxy.py'> does not have the attribute 'external_request'
tests/sentry/objectstore/endpoints/test_organization.py::OrganizationObjectstoreEndpointWithControlSiloTest::test_healthlog
tests/sentry/objectstore/endpoints/test_organization.py:203: in test_health
    response = self.client.get(
.venv/lib/python3.13/site-packages/django/test/client.py:1124: in get
    response = super().get(
.venv/lib/python3.13/site-packages/django/test/client.py:475: in get
    return self.generic(
.venv/lib/python3.13/site-packages/django/test/client.py:671: in generic
    return self.request(**r)
.venv/lib/python3.13/site-packages/django/test/client.py:1087: in request
    self.check_exception(response)
.venv/lib/python3.13/site-packages/django/test/client.py:802: in check_exception
    raise exc_value
.venv/lib/python3.13/site-packages/django/core/handlers/exception.py:55: in inner
    response = get_response(request)
.venv/lib/python3.13/site-packages/django/core/handlers/base.py:197: in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
.venv/lib/python3.13/site-packages/django/views/decorators/csrf.py:65: in _view_wrapper
    return view_func(request, *args, **kwargs)
.venv/lib/python3.13/site-packages/django/views/generic/base.py:105: in view
    return self.dispatch(request, *args, **kwargs)
src/sentry/silo/base.py:166: in override
    return handler(*args, **kwargs)
src/sentry/api/base.py:700: in handle
    raise self.AvailabilityError(message)
E   sentry.silo.base.SiloLimit.AvailabilityError: Received GET request at '/api/0/organizations/4557843062325264/objectstore/health' to server in CONTROL mode. This endpoint is available only in: REGION, MONOLITH

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

Backend Test Failures

Failures on a339368 in this run:

tests/sentry/objectstore/endpoints/test_organization.py::OrganizationObjectstoreEndpointWithControlSiloTest::test_full_cyclelog
tests/sentry/objectstore/endpoints/test_organization.py:220: in test_full_cycle
    response = self.client.post(
.venv/lib/python3.13/site-packages/django/test/client.py:1153: in post
    response = super().post(
.venv/lib/python3.13/site-packages/django/test/client.py:499: in post
    return self.generic(
.venv/lib/python3.13/site-packages/django/test/client.py:671: in generic
    return self.request(**r)
.venv/lib/python3.13/site-packages/django/test/client.py:1087: in request
    self.check_exception(response)
.venv/lib/python3.13/site-packages/django/test/client.py:802: in check_exception
    raise exc_value
.venv/lib/python3.13/site-packages/django/core/handlers/exception.py:55: in inner
    response = get_response(request)
.venv/lib/python3.13/site-packages/django/core/handlers/base.py:197: in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
.venv/lib/python3.13/site-packages/django/views/decorators/csrf.py:65: in _view_wrapper
    return view_func(request, *args, **kwargs)
.venv/lib/python3.13/site-packages/django/views/generic/base.py:105: in view
    return self.dispatch(request, *args, **kwargs)
src/sentry/silo/base.py:166: in override
    return handler(*args, **kwargs)
src/sentry/api/base.py:700: in handle
    raise self.AvailabilityError(message)
E   sentry.silo.base.SiloLimit.AvailabilityError: Received POST request at '/api/0/organizations/4557843177734160/objectstore/v1/objects/test/org=4557843177734160/' to server in CONTROL mode. This endpoint is available only in: REGION, MONOLITH
tests/sentry/objectstore/endpoints/test_organization.py::OrganizationObjectstoreEndpointWithControlSiloTest::test_roundtrip_compressedlog
tests/sentry/objectstore/endpoints/test_organization.py:291: in test_roundtrip_compressed
    response = self.client.post(
.venv/lib/python3.13/site-packages/django/test/client.py:1153: in post
    response = super().post(
.venv/lib/python3.13/site-packages/django/test/client.py:499: in post
    return self.generic(
.venv/lib/python3.13/site-packages/django/test/client.py:671: in generic
    return self.request(**r)
.venv/lib/python3.13/site-packages/django/test/client.py:1087: in request
    self.check_exception(response)
.venv/lib/python3.13/site-packages/django/test/client.py:802: in check_exception
    raise exc_value
.venv/lib/python3.13/site-packages/django/core/handlers/exception.py:55: in inner
    response = get_response(request)
.venv/lib/python3.13/site-packages/django/core/handlers/base.py:197: in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
.venv/lib/python3.13/site-packages/django/views/decorators/csrf.py:65: in _view_wrapper
    return view_func(request, *args, **kwargs)
.venv/lib/python3.13/site-packages/django/views/generic/base.py:105: in view
    return self.dispatch(request, *args, **kwargs)
src/sentry/silo/base.py:166: in override
    return handler(*args, **kwargs)
src/sentry/api/base.py:700: in handle
    raise self.AvailabilityError(message)
E   sentry.silo.base.SiloLimit.AvailabilityError: Received POST request at '/api/0/organizations/4557843183894544/objectstore/v1/objects/test/org=4557843183894544/' to server in CONTROL mode. This endpoint is available only in: REGION, MONOLITH
tests/sentry/middleware/test_proxy.py::FakedAPIProxyTest::test_through_api_gatewaylog
tests/sentry/middleware/test_proxy.py:57: in test_through_api_gateway
    with self.api_gateway_proxy_stubbed():
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/contextlib.py:141: in __enter__
    return next(self.gen)
src/sentry/testutils/cases.py:708: in api_gateway_proxy_stubbed
    with mock.patch(
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1495: in __enter__
    original, local = self.get_original()
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1465: in get_original
    raise AttributeError(
E   AttributeError: <module 'sentry.hybridcloud.apigateway.proxy' from '/home/runner/work/sentry/sentry/src/sentry/hybridcloud/apigateway/proxy.py'> does not have the attribute 'external_request'
tests/sentry/objectstore/endpoints/test_organization.py::OrganizationObjectstoreEndpointWithControlSiloTest::test_healthlog
tests/sentry/objectstore/endpoints/test_organization.py:203: in test_health
    response = self.client.get(
.venv/lib/python3.13/site-packages/django/test/client.py:1124: in get
    response = super().get(
.venv/lib/python3.13/site-packages/django/test/client.py:475: in get
    return self.generic(
.venv/lib/python3.13/site-packages/django/test/client.py:671: in generic
    return self.request(**r)
.venv/lib/python3.13/site-packages/django/test/client.py:1087: in request
    self.check_exception(response)
.venv/lib/python3.13/site-packages/django/test/client.py:802: in check_exception
    raise exc_value
.venv/lib/python3.13/site-packages/django/core/handlers/exception.py:55: in inner
    response = get_response(request)
.venv/lib/python3.13/site-packages/django/core/handlers/base.py:197: in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
.venv/lib/python3.13/site-packages/django/views/decorators/csrf.py:65: in _view_wrapper
    return view_func(request, *args, **kwargs)
.venv/lib/python3.13/site-packages/django/views/generic/base.py:105: in view
    return self.dispatch(request, *args, **kwargs)
src/sentry/silo/base.py:166: in override
    return handler(*args, **kwargs)
src/sentry/api/base.py:700: in handle
    raise self.AvailabilityError(message)
E   sentry.silo.base.SiloLimit.AvailabilityError: Received GET request at '/api/0/organizations/4557843184746512/objectstore/health' to server in CONTROL mode. This endpoint is available only in: REGION, MONOLITH

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

Backend Test Failures

Failures on 9f35efb in this run:

tests/sentry/objectstore/endpoints/test_organization.py::OrganizationObjectstoreEndpointWithControlSiloTest::test_healthlog
tests/sentry/objectstore/endpoints/test_organization.py:203: in test_health
    response = self.client.get(
.venv/lib/python3.13/site-packages/django/test/client.py:1124: in get
    response = super().get(
.venv/lib/python3.13/site-packages/django/test/client.py:475: in get
    return self.generic(
.venv/lib/python3.13/site-packages/django/test/client.py:671: in generic
    return self.request(**r)
.venv/lib/python3.13/site-packages/django/test/client.py:1087: in request
    self.check_exception(response)
.venv/lib/python3.13/site-packages/django/test/client.py:802: in check_exception
    raise exc_value
.venv/lib/python3.13/site-packages/django/core/handlers/exception.py:55: in inner
    response = get_response(request)
.venv/lib/python3.13/site-packages/django/core/handlers/base.py:197: in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
.venv/lib/python3.13/site-packages/django/views/decorators/csrf.py:65: in _view_wrapper
    return view_func(request, *args, **kwargs)
.venv/lib/python3.13/site-packages/django/views/generic/base.py:105: in view
    return self.dispatch(request, *args, **kwargs)
src/sentry/silo/base.py:166: in override
    return handler(*args, **kwargs)
src/sentry/api/base.py:700: in handle
    raise self.AvailabilityError(message)
E   sentry.silo.base.SiloLimit.AvailabilityError: Received GET request at '/api/0/organizations/4557843326304272/objectstore/health' to server in CONTROL mode. This endpoint is available only in: REGION, MONOLITH
tests/sentry/middleware/test_proxy.py::FakedAPIProxyTest::test_through_api_gatewaylog
tests/sentry/middleware/test_proxy.py:57: in test_through_api_gateway
    with self.api_gateway_proxy_stubbed():
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/contextlib.py:141: in __enter__
    return next(self.gen)
src/sentry/testutils/cases.py:708: in api_gateway_proxy_stubbed
    with mock.patch(
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1495: in __enter__
    original, local = self.get_original()
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1465: in get_original
    raise AttributeError(
E   AttributeError: <module 'sentry.hybridcloud.apigateway.proxy' from '/home/runner/work/sentry/sentry/src/sentry/hybridcloud/apigateway/proxy.py'> does not have the attribute 'external_request'
tests/sentry/objectstore/endpoints/test_organization.py::OrganizationObjectstoreEndpointWithControlSiloTest::test_roundtrip_compressedlog
tests/sentry/objectstore/endpoints/test_organization.py:291: in test_roundtrip_compressed
    response = self.client.post(
.venv/lib/python3.13/site-packages/django/test/client.py:1153: in post
    response = super().post(
.venv/lib/python3.13/site-packages/django/test/client.py:499: in post
    return self.generic(
.venv/lib/python3.13/site-packages/django/test/client.py:671: in generic
    return self.request(**r)
.venv/lib/python3.13/site-packages/django/test/client.py:1087: in request
    self.check_exception(response)
.venv/lib/python3.13/site-packages/django/test/client.py:802: in check_exception
    raise exc_value
.venv/lib/python3.13/site-packages/django/core/handlers/exception.py:55: in inner
    response = get_response(request)
.venv/lib/python3.13/site-packages/django/core/handlers/base.py:197: in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
.venv/lib/python3.13/site-packages/django/views/decorators/csrf.py:65: in _view_wrapper
    return view_func(request, *args, **kwargs)
.venv/lib/python3.13/site-packages/django/views/generic/base.py:105: in view
    return self.dispatch(request, *args, **kwargs)
src/sentry/silo/base.py:166: in override
    return handler(*args, **kwargs)
src/sentry/api/base.py:700: in handle
    raise self.AvailabilityError(message)
E   sentry.silo.base.SiloLimit.AvailabilityError: Received POST request at '/api/0/organizations/4557843329712144/objectstore/v1/objects/test/org=4557843329712144/' to server in CONTROL mode. This endpoint is available only in: REGION, MONOLITH
tests/sentry/objectstore/endpoints/test_organization.py::OrganizationObjectstoreEndpointWithControlSiloTest::test_full_cyclelog
tests/sentry/objectstore/endpoints/test_organization.py:220: in test_full_cycle
    response = self.client.post(
.venv/lib/python3.13/site-packages/django/test/client.py:1153: in post
    response = super().post(
.venv/lib/python3.13/site-packages/django/test/client.py:499: in post
    return self.generic(
.venv/lib/python3.13/site-packages/django/test/client.py:671: in generic
    return self.request(**r)
.venv/lib/python3.13/site-packages/django/test/client.py:1087: in request
    self.check_exception(response)
.venv/lib/python3.13/site-packages/django/test/client.py:802: in check_exception
    raise exc_value
.venv/lib/python3.13/site-packages/django/core/handlers/exception.py:55: in inner
    response = get_response(request)
.venv/lib/python3.13/site-packages/django/core/handlers/base.py:197: in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
.venv/lib/python3.13/site-packages/django/views/decorators/csrf.py:65: in _view_wrapper
    return view_func(request, *args, **kwargs)
.venv/lib/python3.13/site-packages/django/views/generic/base.py:105: in view
    return self.dispatch(request, *args, **kwargs)
src/sentry/silo/base.py:166: in override
    return handler(*args, **kwargs)
src/sentry/api/base.py:700: in handle
    raise self.AvailabilityError(message)
E   sentry.silo.base.SiloLimit.AvailabilityError: Received POST request at '/api/0/organizations/4557843328991248/objectstore/v1/objects/test/org=4557843328991248/' to server in CONTROL mode. This endpoint is available only in: REGION, MONOLITH

Comment thread src/sentry/objectstore/endpoints/organization.py
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 24, 2026

Backend Test Failures

Failures on 3be2dab in this run:

tests/sentry/objectstore/endpoints/test_organization.py::OrganizationObjectstoreEndpointWithControlSiloTest::test_healthlog
tests/sentry/objectstore/endpoints/test_organization.py:203: in test_health
    response = self.client.get(
.venv/lib/python3.13/site-packages/django/test/client.py:1124: in get
    response = super().get(
.venv/lib/python3.13/site-packages/django/test/client.py:475: in get
    return self.generic(
.venv/lib/python3.13/site-packages/django/test/client.py:671: in generic
    return self.request(**r)
.venv/lib/python3.13/site-packages/django/test/client.py:1087: in request
    self.check_exception(response)
.venv/lib/python3.13/site-packages/django/test/client.py:802: in check_exception
    raise exc_value
.venv/lib/python3.13/site-packages/django/core/handlers/exception.py:55: in inner
    response = get_response(request)
.venv/lib/python3.13/site-packages/django/core/handlers/base.py:197: in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
.venv/lib/python3.13/site-packages/django/views/decorators/csrf.py:65: in _view_wrapper
    return view_func(request, *args, **kwargs)
.venv/lib/python3.13/site-packages/django/views/generic/base.py:105: in view
    return self.dispatch(request, *args, **kwargs)
src/sentry/silo/base.py:166: in override
    return handler(*args, **kwargs)
src/sentry/api/base.py:700: in handle
    raise self.AvailabilityError(message)
E   sentry.silo.base.SiloLimit.AvailabilityError: Received GET request at '/api/0/organizations/4557847125360656/objectstore/health' to server in CONTROL mode. This endpoint is available only in: REGION, MONOLITH
tests/sentry/middleware/test_proxy.py::FakedAPIProxyTest::test_through_api_gatewaylog
tests/sentry/middleware/test_proxy.py:57: in test_through_api_gateway
    with self.api_gateway_proxy_stubbed():
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/contextlib.py:141: in __enter__
    return next(self.gen)
src/sentry/testutils/cases.py:708: in api_gateway_proxy_stubbed
    with mock.patch(
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1495: in __enter__
    original, local = self.get_original()
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1465: in get_original
    raise AttributeError(
E   AttributeError: <module 'sentry.hybridcloud.apigateway.proxy' from '/home/runner/work/sentry/sentry/src/sentry/hybridcloud/apigateway/proxy.py'> does not have the attribute 'external_request'
tests/sentry/objectstore/endpoints/test_organization.py::OrganizationObjectstoreEndpointWithControlSiloTest::test_roundtrip_compressedlog
tests/sentry/objectstore/endpoints/test_organization.py:291: in test_roundtrip_compressed
    response = self.client.post(
.venv/lib/python3.13/site-packages/django/test/client.py:1153: in post
    response = super().post(
.venv/lib/python3.13/site-packages/django/test/client.py:499: in post
    return self.generic(
.venv/lib/python3.13/site-packages/django/test/client.py:671: in generic
    return self.request(**r)
.venv/lib/python3.13/site-packages/django/test/client.py:1087: in request
    self.check_exception(response)
.venv/lib/python3.13/site-packages/django/test/client.py:802: in check_exception
    raise exc_value
.venv/lib/python3.13/site-packages/django/core/handlers/exception.py:55: in inner
    response = get_response(request)
.venv/lib/python3.13/site-packages/django/core/handlers/base.py:197: in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
.venv/lib/python3.13/site-packages/django/views/decorators/csrf.py:65: in _view_wrapper
    return view_func(request, *args, **kwargs)
.venv/lib/python3.13/site-packages/django/views/generic/base.py:105: in view
    return self.dispatch(request, *args, **kwargs)
src/sentry/silo/base.py:166: in override
    return handler(*args, **kwargs)
src/sentry/api/base.py:700: in handle
    raise self.AvailabilityError(message)
E   sentry.silo.base.SiloLimit.AvailabilityError: Received POST request at '/api/0/organizations/4557847129423888/objectstore/v1/objects/test/org=4557847129423888/' to server in CONTROL mode. This endpoint is available only in: REGION, MONOLITH
tests/sentry/objectstore/endpoints/test_organization.py::OrganizationObjectstoreEndpointWithControlSiloTest::test_full_cyclelog
tests/sentry/objectstore/endpoints/test_organization.py:220: in test_full_cycle
    response = self.client.post(
.venv/lib/python3.13/site-packages/django/test/client.py:1153: in post
    response = super().post(
.venv/lib/python3.13/site-packages/django/test/client.py:499: in post
    return self.generic(
.venv/lib/python3.13/site-packages/django/test/client.py:671: in generic
    return self.request(**r)
.venv/lib/python3.13/site-packages/django/test/client.py:1087: in request
    self.check_exception(response)
.venv/lib/python3.13/site-packages/django/test/client.py:802: in check_exception
    raise exc_value
.venv/lib/python3.13/site-packages/django/core/handlers/exception.py:55: in inner
    response = get_response(request)
.venv/lib/python3.13/site-packages/django/core/handlers/base.py:197: in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
.venv/lib/python3.13/site-packages/django/views/decorators/csrf.py:65: in _view_wrapper
    return view_func(request, *args, **kwargs)
.venv/lib/python3.13/site-packages/django/views/generic/base.py:105: in view
    return self.dispatch(request, *args, **kwargs)
src/sentry/silo/base.py:166: in override
    return handler(*args, **kwargs)
src/sentry/api/base.py:700: in handle
    raise self.AvailabilityError(message)
E   sentry.silo.base.SiloLimit.AvailabilityError: Received POST request at '/api/0/organizations/4557847128506384/objectstore/v1/objects/test/org=4557847128506384/' to server in CONTROL mode. This endpoint is available only in: REGION, MONOLITH

Comment thread src/sentry/hybridcloud/apigateway/middleware.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Backend Test Failures

Failures on 437ff76 in this run:

tests/sentry/middleware/test_proxy.py::FakedAPIProxyTest::test_through_api_gatewaylog
tests/sentry/middleware/test_proxy.py:57: in test_through_api_gateway
    with self.api_gateway_proxy_stubbed():
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/contextlib.py:141: in __enter__
    return next(self.gen)
src/sentry/testutils/cases.py:708: in api_gateway_proxy_stubbed
    with mock.patch(
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1495: in __enter__
    original, local = self.get_original()
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1465: in get_original
    raise AttributeError(
E   AttributeError: <module 'sentry.hybridcloud.apigateway.proxy' from '/home/runner/work/sentry/sentry/src/sentry/hybridcloud/apigateway/proxy.py'> does not have the attribute 'external_request'

Comment thread src/sentry/objectstore/endpoints/organization.py
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 24, 2026

Backend Test Failures

Failures on e59c6de in this run:

tests/sentry/middleware/test_proxy.py::FakedAPIProxyTest::test_through_api_gatewaylog
tests/sentry/middleware/test_proxy.py:83: in test_through_api_gateway
    resp = self.get_success_response(
src/sentry/testutils/cases.py:630: in get_success_response
    assert_status_code(response, status_code)
src/sentry/testutils/asserts.py:46: in assert_status_code
    assert minimum <= response.status_code < maximum, response
E   AssertionError: <StreamingHttpResponse status_code=401, "application/json">
E   assert 401 < 202
E    +  where 401 = <StreamingHttpResponse status_code=401, "application/json">.status_code
tests/sentry/objectstore/endpoints/test_organization.py::OrganizationObjectstoreEndpointWithControlSiloTest::test_full_cyclelog
.venv/lib/python3.13/site-packages/_pytest/runner.py:340: in from_call
    result: Optional[TResult] = func()
.venv/lib/python3.13/site-packages/_pytest/runner.py:240: in <lambda>
    lambda: runtest_hook(item=item, **kwds), when=when, reraise=reraise
.venv/lib/python3.13/site-packages/pluggy/_hooks.py:513: in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
.venv/lib/python3.13/site-packages/pluggy/_manager.py:120: in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
.venv/lib/python3.13/site-packages/_pytest/threadexception.py:87: in pytest_runtest_call
    yield from thread_exception_runtest_hook()
.venv/lib/python3.13/site-packages/_pytest/threadexception.py:63: in thread_exception_runtest_hook
    yield
.venv/lib/python3.13/site-packages/_pytest/unraisableexception.py:90: in pytest_runtest_call
    yield from unraisable_exception_runtest_hook()
.venv/lib/python3.13/site-packages/_pytest/unraisableexception.py:80: in unraisable_exception_runtest_hook
    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))
E   pytest.PytestUnraisableExceptionWarning: Exception ignored in: <socket.socket fd=-1, family=2, type=1, proto=6>
E   
E   Traceback (most recent call last):
E     File "/home/runner/work/sentry/sentry/.venv/lib/python3.13/site-packages/django/db/backends/postgresql/compiler.py", line 69, in assemble_as_sql
E       list(map(list, zip(*value_rows)))
E                      ~~~^^^^^^^^^^^^^
E   ResourceWarning: unclosed <socket.socket fd=84, family=2, type=1, proto=6, laddr=('172.17.0.1', 51320), raddr=('172.17.0.1', 50161)>

@github-actions
Copy link
Copy Markdown
Contributor

Backend Test Failures

Failures on f9d4c65 in this run:

tests/sentry/middleware/test_proxy.py::FakedAPIProxyTest::test_through_api_gatewaylog
tests/sentry/middleware/test_proxy.py:83: in test_through_api_gateway
    resp = self.get_success_response(
src/sentry/testutils/cases.py:629: in get_success_response
    assert_status_code(response, status_code)
src/sentry/testutils/asserts.py:46: in assert_status_code
    assert minimum <= response.status_code < maximum, response
E   AssertionError: <StreamingHttpResponse status_code=401, "application/json">
E   assert 401 < 202
E    +  where 401 = <StreamingHttpResponse status_code=401, "application/json">.status_code

Comment thread src/sentry/hybridcloud/apigateway_async/proxy.py Outdated
Comment thread src/sentry/conf/server.py Outdated
# so that responses aren't modified after Content-Length is set, or have the
# response modifying middleware reset the Content-Length header.
# This is because CommonMiddleware Sets the Content-Length header for non-streaming responses.
APIGW_ASYNC = os.environ.get("SENTRY_ASYNC_APIGW", "").lower() in ("1", "true", "y", "yes")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
APIGW_ASYNC = os.environ.get("SENTRY_ASYNC_APIGW", "").lower() in ("1", "true", "y", "yes")
APIGW_ASYNC = os.environ.get("SENTRY_ASYNC_APIGW", "") == "1"

Not blocking, but I usually would do the simplest thing that could work.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a strict rule about "boolean" env vars across the org. I prefer to just map every possible case rather than dealing with the headaches of debugging why something doesn't work as expected because someone might have set the value to true rather than 1.

self.concurrency = concurrency
self.counter_window = failures[0]
self.failures = failures[1]
self.semaphore = asyncio.Semaphore(self.concurrency)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're running multiple replicas there doesn't seem to be a way to circuit break across the deployment, each replica will have to figure out that there is a problem independently. I'm guessing you went with this approach because our existing circuit breakers use sync-redis?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly for that reason, yes.

But also, I don't see that much of a value in having the circuit breaking data shared across the whole deployment (but also the next deployment when we release stuff). Having each worker (not the whole pod) with its own state IMO allows us to configure more granular limits, and avoid conditions in which we overflow the concurrency circuit because of scaling and the amount of running control pods. There might also be conditions in which a single pod is failing to connect to the upstream for $REASONS (eg: machine specific temporary network issues) and I don't think we want that single pod to overload the circuit for everything else.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's all fair. We'll be able to use metrics to see how many gateways instances have breakers open as well, and observe how coherent that state is too.

Comment thread src/sentry/hybridcloud/apigateway_async/proxy.py Outdated
Comment on lines +158 to +163
@contextmanager
def mock_proxy_client(router: HttpxMockRouter):
"""Patch the proxy_client with a mock httpx.AsyncClient using the given router."""
mock_client = httpx.AsyncClient(transport=httpx.MockTransport(router.handler))
with patch("sentry.hybridcloud.apigateway_async.proxy.proxy_client", mock_client):
yield mock_client
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate that there isn't a library like responses for httpx 😢

from asgiref.sync import sync_to_async
from django.test.client import Client

class MockedProxy:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the tests run with async mode?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but given our tests run with sync Django's test client, the code path actually used is the async_to_sync one in the middleware.

The only tests that run with ASGI are the ones using the liveserver with control silo, as that server should launch following the devserver configuration, which is set to ASGI for control.

Comment thread src/sentry/utils/sdk.py
Comment on lines +273 to +274
# FIXME: when in ASGI, the call to `options.store` from `in_random_rollout`
# would fail, because of SyncOnlyOperation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be a big footgun if we can't use options in the async context.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, as soon it's inside a sync_to_async block.

AFAICT this is the only code-section which is problematic, it doesn't invalidate the usage of options.store in ASGI. But, we might need to review more middlewares and add an async-specific flow. Hard to tell right now without further live testing.

Comment thread src/sentry/hybridcloud/apigateway_async/proxy.py
Comment thread src/sentry/hybridcloud/apigateway_async/middleware.py
@gi0baro gi0baro requested a review from markstory March 30, 2026 10:56
assert request.method is not None
query_params = request.GET

timeout = ENDPOINT_TIMEOUT_OVERRIDE.get(url_name, settings.GATEWAY_PROXY_TIMEOUT)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The async proxy removes runtime timeout configuration. The default GATEWAY_PROXY_TIMEOUT is None, which can cause requests to hang indefinitely for most endpoints.
Severity: HIGH

Suggested Fix

Restore the previous timeout logic by first checking options.get("apigateway.proxy.timeout"), then falling back to settings.GATEWAY_PROXY_TIMEOUT, and finally applying endpoint-specific overrides. Alternatively, set a reasonable default timeout instead of None.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/hybridcloud/apigateway_async/proxy.py#L163

Potential issue: The new async proxy implementation removes the check for the runtime
option `options.get("apigateway.proxy.timeout")`. The fallback,
`settings.GATEWAY_PROXY_TIMEOUT`, is `None`, which disables timeouts for the `httpx`
client. Since only a few endpoints have explicit overrides in
`ENDPOINT_TIMEOUT_OVERRIDE`, most API gateway requests will have no timeout. This can
cause requests to hang indefinitely, potentially exhausting the connection pool and
defeating the purpose of the async refactor.

Did we get this right? 👍 / 👎 to inform future reviews.

Comment on lines +44 to +45
proxy_client = httpx.AsyncClient()
circuitbreakers = CircuitBreakerManager()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The module-level CircuitBreakerManager creates asyncio.Semaphore instances that are not thread-safe, causing a RuntimeError when accessed from different worker threads in a multi-threaded server setup.
Severity: HIGH

Suggested Fix

The CircuitBreakerManager instance should not be a module-level singleton. It should be instantiated within a context that ensures each worker thread or process gets its own instance, preventing cross-thread access to event loop-bound objects.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/hybridcloud/apigateway_async/proxy.py#L44-L45

Potential issue: A module-level `CircuitBreakerManager` is initialized, which is shared
across all worker threads. The manager lazily creates `asyncio.Semaphore` instances,
which become bound to the event loop of the thread that first accesses them. In a
multi-threaded environment (like Granian with Python 3.13+), if another thread tries to
access the same circuit breaker, it will receive a semaphore bound to a different event
loop. This will raise a `RuntimeError: "Task got Future attached to a different loop"`,
causing a runtime crash.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.


def window_overflow(self) -> bool:
self._maybe_counter_flip()
return self._counters[self._counter_idx] > self.failures
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circuit breaker off-by-one in failure threshold check

Low Severity

The window_overflow method uses > instead of >= when comparing the failure counter against self.failures (sourced from APIGATEWAY_PROXY_MAX_FAILURES). This means if the max is configured to 100, the circuit breaker actually allows 101 failures before tripping. Given the setting name implies an upper bound, >= would match the intended semantics.

Fix in Cursor Fix in Web

Comment thread src/sentry/utils/sdk.py
# for the moment let's just simplify and skip this entirely.
asyncio.get_running_loop()
return None
except Exception:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overly broad exception catch masks potential errors

Low Severity

asyncio.get_running_loop() only raises RuntimeError when no event loop is running, but the handler catches bare Exception. This silently swallows any unexpected error that might occur in the try block, making future bugs in this area harder to diagnose. Using except RuntimeError would correctly express the intent.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Member

@markstory markstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving so that we can test this out further with a small amount of live traffic to see if we should continue to figure out how to run this long term.

assert request.method is not None
query_params = request.GET

timeout = ENDPOINT_TIMEOUT_OVERRIDE.get(url_name, settings.GATEWAY_PROXY_TIMEOUT)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the apigateway.proxy.timeout option as a fallback instead of the setting as we don't have any values for the setting in production.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address this in the next steps, calling into options here might be a bad idea (blocking code) and we might instead start using the setting in production. It needs proper testing.

url_name: str,
) -> HttpResponseBase:
"""Take a django request object and proxy it to a cell silo"""
metric_tags = {"region": cell.name, "url_name": url_name}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a metric tag that lets us know metric values are coming from the async gateway and not the sync one? right now I don't think we'd be able to tell the two apart other than by pod-names.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna address this in a following PR, for now the pod name should be good, as we'll run this in a single pod.

@gi0baro gi0baro merged commit b71c092 into master Mar 31, 2026
133 of 134 checks passed
@gi0baro gi0baro deleted the gi0baro/async-apigateway branch March 31, 2026 09:31
dashed pushed a commit that referenced this pull request Apr 1, 2026
This changes the `apigateway` proxy to be async, with the idea to serve
the relevant deployment of control silo in ASGI rather than WSGI.

The rationale here is to avoid situations in which we exhaust the
server's threadpool by just waiting for apigateway requests to complete,
as we saw in INCs 2054/2056.

**Note:** the APIGateway changes are gated into a separated Python
module, the async flow is enabled through `SENTRY_ASYNC_APIGW`
environment variable. This allows us to control the rollout of the
change in prod. Tests and local devserver are instead always using the
new code.

Detailed changes:
- [x] Make APIGateway proxy `async`, switching inner client impl from
`requests` to `httpx`
- [x] Change APIGateway middleware to work both in ASGI and WSGI
contexts (with the latter using `async_to_sync`)
- [x] Update relevant tests interacting with APIGateway
- [x] Fix proxy acceptance test
- <s> Fix ORM calls in the custom SDK integration</s>
- [x] Bypass ORM calls in SDK custom logging integration
- [x] Restore/adapt circuit brakers
@github-actions github-actions bot locked and limited conversation to collaborators Apr 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants