Refactor generic tests to use batch start and end dates by sydneynotthecity · Pull Request #225 · stellar/stellar-dbt-public

sydneynotthecity · 2026-04-08T17:24:38Z

PR Checklist

PR Structure

This PR has reasonably narrow scope (if not, break it down into smaller PRs).
This PR avoids mixing refactoring changes with feature changes (split into two PRs
otherwise).
This PR's title starts with the jira ticket associated with the PR.

Thoroughness

This PR adds tests for the most critical parts of the new functionality or fixes.
I've updated the docs and README with the added features, breaking changes, new instructions on how to use the repository.

Release planning

I've decided if this PR requires a new major/minor/patch version accordingly to
semver, and I've changed the name of the BRANCH to major/* , minor/* or patch/* .

What

Migrated all generic and singular tests from the dbt_airflow_macros package to use batch_start_date / batch_end_date variables, allowing for more targeted testing during backfilling. Also optimized the uniqueness tests for BigQuery performance on high-cardinality tables. The refactor supports improving the entity attribution pipeline runtime, which is now consistent 75+ minutes for a single day.

Specifically:

Refactored all 5 generic tests (incremental_accepted_values, incremental_not_null, incremental_unique, incremental_unique_combination_of_columns, incremental_expression_is_true) to use var("batch_start_date") / var("batch_end_date") instead of dbt_airflow_macros.ts(timezone=none)
Refactored all 5 remaining singular tests to replace dbt_airflow_macros.ts(timezone=none) with var("batch_end_date") as the anchor timestamp
Removed yu-iskw/dbt_airflow_macros from packages.yml
Dropped the less_than_equal_to parameter from all generic tests (was unused by every caller)
Retained greater_than_equal_to as a deprecated fallback that widens the lower bound: [batch_start_date - interval, batch_end_date)
Default generic test window is now [batch_start_date, batch_end_date) with an exclusive upper bound, matching model conventions
Optimized incremental_unique and incremental_unique_combination_of_columns by replacing GROUP BY / HAVING count(*) > 1 with QUALIFY ROW_NUMBER() OVER (PARTITION BY ...) > 1 to avoid full hash aggregation on high-cardinality columns

Why

Models and macros were fully migrated to batch_start_date / batch_end_date in a prior refactor, but all tests still depended on the dbt_airflow_macros package. The old pattern anchored on a single Airflow timestamp with hardcoded intervals, which was brittle and didn't align with flexible date range backfills. Tests should validate exactly the window that was loaded.

The GROUP BY / HAVING pattern in uniqueness tests is expensive on BigQuery for tables with 15-30M daily rows. QUALIFY ROW_NUMBER() avoids this by streaming through partitions without aggregation. This will speed up the entity attribution models by several minutes where we're seeing a bottleneck due to long running tests (7+ min to finish)

The anomaly detection tests were not measuring anything actionable and have been removed in favor of Datafold tests. The rest of the singular tests can be removed after confirming they run properly in Datafold.

Known limitations

This does not refactor any specific model configurations that use these tests. We will need to audit greater_than_or_equal_to usage and remove/adjust as necessary.

This also does not refactor any references to airflow_start_timestamp which is another env var passed from airflow at runtime. That can be done at a future date

Refactor tests to use batch start and end dates

9309552

sydneynotthecity requested a review from a team as a code owner April 8, 2026 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor generic tests to use batch start and end dates#225

Refactor generic tests to use batch start and end dates#225
sydneynotthecity wants to merge 1 commit intorelease-v20260413from
deprecate-dbt-airflow-macros

sydneynotthecity commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sydneynotthecity commented Apr 8, 2026

PR Structure

Thoroughness

Release planning

What

Why

Known limitations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant