Skip to content

Refactor generic tests to use batch start and end dates#225

Open
sydneynotthecity wants to merge 1 commit intorelease-v20260413from
deprecate-dbt-airflow-macros
Open

Refactor generic tests to use batch start and end dates#225
sydneynotthecity wants to merge 1 commit intorelease-v20260413from
deprecate-dbt-airflow-macros

Conversation

@sydneynotthecity
Copy link
Copy Markdown
Contributor

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with the jira ticket associated with the PR.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated the docs and README with the added features, breaking changes, new instructions on how to use the repository.

Release planning

  • I've decided if this PR requires a new major/minor/patch version accordingly to
    semver, and I've changed the name of the BRANCH to major/* , minor/* or patch/* .

What

Migrated all generic and singular tests from the dbt_airflow_macros package to use batch_start_date / batch_end_date variables, allowing for more targeted testing during backfilling. Also optimized the uniqueness tests for BigQuery performance on high-cardinality tables. The refactor supports improving the entity attribution pipeline runtime, which is now consistent 75+ minutes for a single day.

Specifically:

  • Refactored all 5 generic tests (incremental_accepted_values, incremental_not_null, incremental_unique, incremental_unique_combination_of_columns, incremental_expression_is_true) to use var("batch_start_date") / var("batch_end_date") instead of dbt_airflow_macros.ts(timezone=none)
  • Refactored all 5 remaining singular tests to replace dbt_airflow_macros.ts(timezone=none) with var("batch_end_date") as the anchor timestamp
  • Removed yu-iskw/dbt_airflow_macros from packages.yml
  • Dropped the less_than_equal_to parameter from all generic tests (was unused by every caller)
  • Retained greater_than_equal_to as a deprecated fallback that widens the lower bound: [batch_start_date - interval, batch_end_date)
  • Default generic test window is now [batch_start_date, batch_end_date) with an exclusive upper bound, matching model conventions
  • Optimized incremental_unique and incremental_unique_combination_of_columns by replacing GROUP BY / HAVING count(*) > 1 with QUALIFY ROW_NUMBER() OVER (PARTITION BY ...) > 1 to avoid full hash aggregation on high-cardinality columns

Why

Models and macros were fully migrated to batch_start_date / batch_end_date in a prior refactor, but all tests still depended on the dbt_airflow_macros package. The old pattern anchored on a single Airflow timestamp with hardcoded intervals, which was brittle and didn't align with flexible date range backfills. Tests should validate exactly the window that was loaded.

The GROUP BY / HAVING pattern in uniqueness tests is expensive on BigQuery for tables with 15-30M daily rows. QUALIFY ROW_NUMBER() avoids this by streaming through partitions without aggregation. This will speed up the entity attribution models by several minutes where we're seeing a bottleneck due to long running tests (7+ min to finish)

The anomaly detection tests were not measuring anything actionable and have been removed in favor of Datafold tests. The rest of the singular tests can be removed after confirming they run properly in Datafold.

Known limitations

This does not refactor any specific model configurations that use these tests. We will need to audit greater_than_or_equal_to usage and remove/adjust as necessary.

This also does not refactor any references to airflow_start_timestamp which is another env var passed from airflow at runtime. That can be done at a future date

@sydneynotthecity sydneynotthecity requested a review from a team as a code owner April 8, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant