Skip to content

feat(billing): Implement ClickHouse backend for UsageService#111799

Merged
dashed merged 15 commits intomasterfrom
billing/BIL-2197
Apr 1, 2026
Merged

feat(billing): Implement ClickHouse backend for UsageService#111799
dashed merged 15 commits intomasterfrom
billing/BIL-2197

Conversation

@dashed
Copy link
Copy Markdown
Member

@dashed dashed commented Mar 30, 2026

Summary

Replaces the UsageService.get_usage() stub with a real ClickHouse-backed implementation that queries the outcomes entity via Snuba (Phase 1 of the PG→CH billing usage migration).

  • New module _outcomes_query.py: queries ClickHouse outcomes with daily granularity (Granularity(86400) on the hourly table), maps outcome+reason to UsageData proto fields
  • Wired service.py: calls query_outcomes_usage() instead of returning empty response
  • Outcome → UsageData mapping: ACCEPTED→accepted, RATE_LIMITED→dropped (with over_quota/spike_protection sub-fields), FILTERED→filtered (with dynamic_sampling sub-field)
  • Inclusive end date handling: proto GetUsageRequest.end is inclusive (midnight of last included day) but Snuba uses half-open [start, end) — adds +timedelta(days=1) to convert
  • Over-quota detection: uses _is_over_quota_reason() suffix match (reason.endswith("_usage_exceeded")) instead of hardcoded frozenset — future-proof for new categories
  • Truncation detection: logs warning + emits metric (sample_rate=1.0) when query hits 10K row limit
  • Dedicated referrer billing.usage_service.clickhouse for observability
  • Categories passed as Relay/Sentry ints directly (matching CH category column per BIL-2176)
  • seats=[] — seat data stays in Postgres

Key design decisions

Decision Rationale
Hourly table via Entity("outcomes") Daily table unreachable from Snuba HTTP API without snuba changes (deferred to Phase 4)
+timedelta(days=1) for end date Proto end is inclusive but Snuba needs exclusive. Can't use <= because hourly table has rows at each hour — <= midnight would miss 23 hours
Suffix match for over-quota reasons Hardcoded frozenset was incomplete and fragile. Pattern f"{api_name}_usage_exceeded" from getsentry/quotas.py is reliable
type: ignore[arg-type] for CategoryUsage Proto expects ValueType enum but billing uses raw Relay ints per BIL-2176 convention (same pattern as getsentry PG backend)

What this does NOT include (handled in later PRs):

  • Trial filtering (_to_billable() in getsentry's shadow.py)
  • Billing category merging (get_billed_category() in getsentry)
  • Shadow validation / deviation analysis
  • Daily table routing in Snuba

Test Plan

  • 22 unit tests — mock raw_snql_query, test all outcome→field mappings, query construction, response building, overlapping semantics invariant, over-quota suffix matching, end date +1 day shift (tests/sentry/billing/platform/services/usage/test_outcomes_query.py)
  • 10 integration tests — real ClickHouse via OutcomesSnubaTest + store_outcomes(), end-to-end via UsageService().get_usage(), inclusive end date edge case (tests/snuba/billing/platform/services/usage/test_outcomes_integration.py)
  • Pre-commit passes clean
  • All 32 tests pass locally

Integration test data path (verified):

store_outcomes() → POST /tests/entities/outcomes/insert (Snuba)
  → INSERT outcomes_raw_local (ClickHouse)
  → MV fires synchronously → outcomes_hourly_local
Query: Entity("outcomes") → OutcomesStorageSelector → outcomes_hourly_local
  → Granularity(86400) aggregates hourly→daily → results

Fixes BIL-2197

@linear-code
Copy link
Copy Markdown

linear-code bot commented Mar 30, 2026

Comment thread src/sentry/billing/platform/services/usage/_outcomes_query.py
org_id: int,
start: datetime,
end: datetime,
categories: Sequence[int],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the type of the category here the proto type or the relay type? IIRC it's the proto type, can we make that explicit?

Condition(Column("timestamp"), Op.LT, end),
]
if categories:
where.append(Condition(Column("category"), Op.IN, categories))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will probably need to convert from the proto type to the relay type here

for day_str in sorted(days_map):
date = _parse_day(day_str)
usage = [
CategoryUsage(category=cat, data=UsageData(**fields)) # type: ignore[arg-type]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here we need to convert from the relay type back to the proto type

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need to defer this as a future migration. we have lot of code that assume/use DataCategory (relay).

also see https://github.com/getsentry/getsentry/blob/6452181df318ae43e493596ac14ac41289c678c0/getsentry/billing/platform/services/usage_pricer/service.py#L160-L163

@dashed dashed requested review from a team as code owners April 1, 2026 15:29
dashed added 8 commits April 1, 2026 11:33
…ckage

These were created by the e2e test agent but not staged with the
initial commit.

BIL-2197
…version

The proto contract defines GetUsageRequest.end as inclusive (midnight of
the last included day), matching how getsentry callers construct it from
BillingHistory.period_end. The Snuba query uses a half-open [start, end)
interval. Without the +1 day shift, the entire last day of the billing
period would be excluded from ClickHouse results.

Example: caller passes end=2025-01-31T00:00:00Z (meaning "include Jan 31").
Without fix: timestamp < 2025-01-31T00:00:00Z → misses all of Jan 31.
With fix: timestamp < 2025-02-01T00:00:00Z → includes all hourly rows
for Jan 31.

BIL-2197
Stores data at 6am on a day and passes end=midnight of that same day
(the inclusive convention). Verifies the +1 day conversion ensures the
data is returned. Without the fix, this test would fail because the
Snuba half-open interval would exclude the entire day.

BIL-2197
- Use Sequence[int] for _build_query categories parameter (covariant)
- Add type: ignore[arg-type] for CategoryUsage(category=int) — proto
  expects ValueType but billing uses raw Relay ints per BIL-2176
- Fix integration test dict keys with int(u.category) for DataCategory
  lookups
- Add Sequence import from collections.abc

BIL-2197
Assigns @getsentry/revenue as owner for billing test files, matching
the existing /src/sentry/billing/ ownership.

BIL-2197
Move outcome/reason mapping from Python into ClickHouse using sumIf
conditional aggregation. This reduces query result rows from
O(days × categories × outcomes × reasons) to O(days × categories),
making the 10K row limit effectively unreachable for billing queries.

Before: GROUP BY outcome, reason, category, time → ~10K rows for large
orgs over 90 days, with Python-side _map_outcome_to_fields post-processing.

After: GROUP BY category, time → ~1,350 rows max (90 days × 15 categories),
with all 7 UsageData fields (total, accepted, dropped, filtered,
over_quota, spike_protection, dynamic_sampling) computed directly in CH.

Uses endsWith() for over-quota suffix matching to avoid ClickHouse LIKE
underscore-wildcard escaping issues.

BIL-2197
…House

Proto DataCategory uses different int values from Relay/Sentry
(e.g., proto ATTACHMENT=3 vs Relay ATTACHMENT=4). The proto request
carries proto ints, but ClickHouse stores Relay ints. Without this
conversion, category filtering would match the wrong categories for
any type where proto != relay.

Adds _category_mapping.py with PROTO_TO_RELAY_CATEGORY mapping
(mirrors getsentry's _category_mapping.py) and applies the conversion
in query_outcomes_usage() before building the Snuba query.

Adds integration test verifying proto ATTACHMENT (3) correctly filters
to Relay ATTACHMENT (4) data in ClickHouse.

BIL-2197
@dashed dashed force-pushed the billing/BIL-2197 branch from eb8a05d to f699305 Compare April 1, 2026 15:33
@dashed dashed removed request for a team April 1, 2026 15:34
dashed added 2 commits April 1, 2026 11:47
Sentry validates referrers via the Referrer StrEnum before sending
queries to Snuba. Unregistered referrers generate warning logs and
metrics noise. Register "billing.usage_service.clickhouse" and use
the enum value instead of a raw string.
The proto field is typed as DataCategory (proto enum) but all existing
consumers interpret it as Relay/Sentry ints. Add a NOTE explaining
this is intentional and pointing to the planned migration TODO.
Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread src/sentry/billing/platform/services/usage/_outcomes_query.py Outdated
dashed added 2 commits April 1, 2026 12:36
The PG BillingMetricUsage table only contains ACCEPTED, FILTERED, and
RATE_LIMITED outcomes (getsentry outcomes consumer filters others).
The CH outcomes table has all outcome types including INVALID, ABUSE,
CLIENT_DISCARD, and CARDINALITY_LIMITED. Using bare sum(quantity)
over-counted total vs PG. Switch to sumIf with outcome IN filter to
match PG semantics: total = accepted + dropped + filtered.
Extract _BILLABLE_OUTCOMES constant and _total_function() helper so
the total field can count either billable-only outcomes (for billing
consumers matching PG semantics) or all outcomes (for future Stats v2
migration). The _build_query() function accepts a total_outcomes
keyword parameter; query_outcomes_usage() passes _BILLABLE_OUTCOMES.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components Scope: Frontend Automatically applied to PRs that change frontend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants