feat(billing): Implement ClickHouse backend for UsageService#111799
feat(billing): Implement ClickHouse backend for UsageService#111799
Conversation
aa43207 to
bc5a029
Compare
| org_id: int, | ||
| start: datetime, | ||
| end: datetime, | ||
| categories: Sequence[int], |
There was a problem hiding this comment.
Is the type of the category here the proto type or the relay type? IIRC it's the proto type, can we make that explicit?
| Condition(Column("timestamp"), Op.LT, end), | ||
| ] | ||
| if categories: | ||
| where.append(Condition(Column("category"), Op.IN, categories)) |
There was a problem hiding this comment.
We will probably need to convert from the proto type to the relay type here
| for day_str in sorted(days_map): | ||
| date = _parse_day(day_str) | ||
| usage = [ | ||
| CategoryUsage(category=cat, data=UsageData(**fields)) # type: ignore[arg-type] |
There was a problem hiding this comment.
And here we need to convert from the relay type back to the proto type
There was a problem hiding this comment.
we'll need to defer this as a future migration. we have lot of code that assume/use DataCategory (relay).
…ckage These were created by the e2e test agent but not staged with the initial commit. BIL-2197
…version The proto contract defines GetUsageRequest.end as inclusive (midnight of the last included day), matching how getsentry callers construct it from BillingHistory.period_end. The Snuba query uses a half-open [start, end) interval. Without the +1 day shift, the entire last day of the billing period would be excluded from ClickHouse results. Example: caller passes end=2025-01-31T00:00:00Z (meaning "include Jan 31"). Without fix: timestamp < 2025-01-31T00:00:00Z → misses all of Jan 31. With fix: timestamp < 2025-02-01T00:00:00Z → includes all hourly rows for Jan 31. BIL-2197
Stores data at 6am on a day and passes end=midnight of that same day (the inclusive convention). Verifies the +1 day conversion ensures the data is returned. Without the fix, this test would fail because the Snuba half-open interval would exclude the entire day. BIL-2197
- Use Sequence[int] for _build_query categories parameter (covariant) - Add type: ignore[arg-type] for CategoryUsage(category=int) — proto expects ValueType but billing uses raw Relay ints per BIL-2176 - Fix integration test dict keys with int(u.category) for DataCategory lookups - Add Sequence import from collections.abc BIL-2197
Assigns @getsentry/revenue as owner for billing test files, matching the existing /src/sentry/billing/ ownership. BIL-2197
Move outcome/reason mapping from Python into ClickHouse using sumIf conditional aggregation. This reduces query result rows from O(days × categories × outcomes × reasons) to O(days × categories), making the 10K row limit effectively unreachable for billing queries. Before: GROUP BY outcome, reason, category, time → ~10K rows for large orgs over 90 days, with Python-side _map_outcome_to_fields post-processing. After: GROUP BY category, time → ~1,350 rows max (90 days × 15 categories), with all 7 UsageData fields (total, accepted, dropped, filtered, over_quota, spike_protection, dynamic_sampling) computed directly in CH. Uses endsWith() for over-quota suffix matching to avoid ClickHouse LIKE underscore-wildcard escaping issues. BIL-2197
…House Proto DataCategory uses different int values from Relay/Sentry (e.g., proto ATTACHMENT=3 vs Relay ATTACHMENT=4). The proto request carries proto ints, but ClickHouse stores Relay ints. Without this conversion, category filtering would match the wrong categories for any type where proto != relay. Adds _category_mapping.py with PROTO_TO_RELAY_CATEGORY mapping (mirrors getsentry's _category_mapping.py) and applies the conversion in query_outcomes_usage() before building the Snuba query. Adds integration test verifying proto ATTACHMENT (3) correctly filters to Relay ATTACHMENT (4) data in ClickHouse. BIL-2197
Sentry validates referrers via the Referrer StrEnum before sending queries to Snuba. Unregistered referrers generate warning logs and metrics noise. Register "billing.usage_service.clickhouse" and use the enum value instead of a raw string.
The proto field is typed as DataCategory (proto enum) but all existing consumers interpret it as Relay/Sentry ints. Add a NOTE explaining this is intentional and pointing to the planned migration TODO.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
The PG BillingMetricUsage table only contains ACCEPTED, FILTERED, and RATE_LIMITED outcomes (getsentry outcomes consumer filters others). The CH outcomes table has all outcome types including INVALID, ABUSE, CLIENT_DISCARD, and CARDINALITY_LIMITED. Using bare sum(quantity) over-counted total vs PG. Switch to sumIf with outcome IN filter to match PG semantics: total = accepted + dropped + filtered.
Extract _BILLABLE_OUTCOMES constant and _total_function() helper so the total field can count either billable-only outcomes (for billing consumers matching PG semantics) or all outcomes (for future Stats v2 migration). The _build_query() function accepts a total_outcomes keyword parameter; query_outcomes_usage() passes _BILLABLE_OUTCOMES.

Summary
Replaces the
UsageService.get_usage()stub with a real ClickHouse-backed implementation that queries theoutcomesentity via Snuba (Phase 1 of the PG→CH billing usage migration)._outcomes_query.py: queries ClickHouse outcomes with daily granularity (Granularity(86400)on the hourly table), maps outcome+reason toUsageDataproto fieldsservice.py: callsquery_outcomes_usage()instead of returning empty responseGetUsageRequest.endis inclusive (midnight of last included day) but Snuba uses half-open[start, end)— adds+timedelta(days=1)to convert_is_over_quota_reason()suffix match (reason.endswith("_usage_exceeded")) instead of hardcoded frozenset — future-proof for new categoriessample_rate=1.0) when query hits 10K row limitbilling.usage_service.clickhousefor observabilitycategorycolumn per BIL-2176)seats=[]— seat data stays in PostgresKey design decisions
Entity("outcomes")+timedelta(days=1)for end date<=because hourly table has rows at each hour —<= midnightwould miss 23 hoursf"{api_name}_usage_exceeded"fromgetsentry/quotas.pyis reliabletype: ignore[arg-type]for CategoryUsageValueTypeenum but billing uses raw Relay ints per BIL-2176 convention (same pattern as getsentry PG backend)What this does NOT include (handled in later PRs):
_to_billable()in getsentry'sshadow.py)get_billed_category()in getsentry)Test Plan
raw_snql_query, test all outcome→field mappings, query construction, response building, overlapping semantics invariant, over-quota suffix matching, end date +1 day shift (tests/sentry/billing/platform/services/usage/test_outcomes_query.py)OutcomesSnubaTest+store_outcomes(), end-to-end viaUsageService().get_usage(), inclusive end date edge case (tests/snuba/billing/platform/services/usage/test_outcomes_integration.py)Integration test data path (verified):
Fixes BIL-2197