-
Notifications
You must be signed in to change notification settings - Fork 123
fix: set explicit incremental_strategy='append' for test_result_rows.sql #918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The model no longer has a unique_key (removed in f5c062d), so using merge incremental strategy causes Redshift to fail with: 'MERGE statement with a single WHEN clause is not enabled.' Removing incremental_strategy lets dbt use the default append strategy which is appropriate for this append-only table. Fixes CORE-279 Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
📝 WalkthroughWalkthroughThe model Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
The elementary-data package now requires Python >=3.10, so update the test workflow to use Python 3.10 instead of 3.9. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…s.sql Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…ables - Creates new macro that returns 'append' for Athena and None for other adapters - Uses this macro in test_result_rows.sql instead of hardcoded 'append' - Fixes Athena's default insert_overwrite strategy issue while allowing other adapters to use their defaults Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@macros/utils/cross_db_utils/incremental_strategy.sql`:
- Around line 21-31: The default implementation of
get_append_only_incremental_strategy currently returns none
(default__get_append_only_incremental_strategy), which lets non-Athena adapters
fall back to their own defaults (e.g., merge) and causes test_result_rows.sql to
behave inconsistently; change default__get_append_only_incremental_strategy to
explicitly return the string "append" so get_append_only_incremental_strategy
(and athena__get_append_only_incremental_strategy) uniformly indicate
append-only semantics across all adapters.
| {% macro get_append_only_incremental_strategy() %} | ||
| {% do return(adapter.dispatch("get_append_only_incremental_strategy", "elementary")()) %} | ||
| {% endmacro %} | ||
|
|
||
| {%- macro athena__get_append_only_incremental_strategy() %} | ||
| {% do return("append") %} | ||
| {% endmacro %} | ||
|
|
||
| {% macro default__get_append_only_incremental_strategy() %} | ||
| {% do return(none) %} | ||
| {% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find all usages of get_append_only_incremental_strategy in the codebase
rg "get_append_only_incremental_strategy" -A 3 -B 3Repository: elementary-data/dbt-data-reliability
Length of output: 1724
🏁 Script executed:
# Also check the context of incremental_strategy.sql to understand the broader pattern
cat -n macros/utils/cross_db_utils/incremental_strategy.sqlRepository: elementary-data/dbt-data-reliability
Length of output: 1203
🏁 Script executed:
# Search for where incremental_strategy is configured or set in the dbt project
fd -t f -e sql -e yml | xargs rg "incremental_strategy" | head -30Repository: elementary-data/dbt-data-reliability
Length of output: 3335
🏁 Script executed:
# Check test_result_rows.sql for unique_key configuration
cat -n models/edr/run_results/test_result_rows.sqlRepository: elementary-data/dbt-data-reliability
Length of output: 1177
🏁 Script executed:
# Search for all config blocks in that file to see if unique_key is defined
rg -A 10 "config(" models/edr/run_results/test_result_rows.sqlRepository: elementary-data/dbt-data-reliability
Length of output: 144
🌐 Web query:
dbt incremental_strategy none default behavior merge fallback
💡 Result:
- In dbt, if you don’t set
incremental_strategy(i.e., it’s effectivelyNone), dbt calls the adapter’s default incremental SQL macro (get_incremental_default_sql). The actual behavior is adapter-specific (it is not universallymerge). [1], [2] - Many warehouse adapters set their default to
merge(for example: Snowflake and BigQuery). [3], [4] - “Merge fallback” commonly happens via
unique_keyrules:- If you configure a strategy like
merge/delete+insert/microbatchbut omitunique_key, dbt generally falls back toappend. [5] - BigQuery is an exception:
mergeon BigQuery requiresunique_key; leaving it out can be ambiguous or fail rather than cleanly falling back. [5]
- If you configure a strategy like
So: incremental_strategy: None ⇒ “use the adapter default,” which may be merge on some platforms, but not all; and separately, missing unique_key can force/trigger an append fallback (with BigQuery caveats). [3], [4], [5]
Sources:
[1] dbt-labs/docs.getdbt.com#1761
[2] https://docs.getdbt.com/docs/build/incremental-strategy
[3] https://docs.getdbt.com/reference/resource-configs/snowflake-configs
[4] https://docs.getdbt.com/reference/resource-configs/bigquery-configs
[5] https://docs.getdbt.com/reference/resource-configs/unique_key
🏁 Script executed:
# Verify if there are any comments or documentation in the file about why append-only strategy is used
cat -n models/edr/run_results/test_result_rows.sql | head -30Repository: elementary-data/dbt-data-reliability
Length of output: 1177
🌐 Web query:
dbt incremental_strategy append vs merge difference unique_key requirement
💡 Result:
append vs merge in dbt incremental models
append
- Behavior: Insert-only — dbt runs an
INSERTof the incremental query results into the target table; it does not update or delete existing rows. - Duplicates: dbt doesn’t check for existing rows, so duplicates can accumulate if the same records appear again. [2]
unique_key: Not required (and not used for upserts). If you don’t setunique_key, dbt/your adapter effectively uses append behavior by default. [1]
merge
- Behavior: Upsert — dbt updates existing rows and inserts new rows by matching on
unique_key(SCD Type 1–like overwrite semantics). [2] - Duplicates: With a valid
unique_key,mergehelps prevent duplicates by updating matched keys instead of inserting again. [2] unique_key: Required for true merge/upsert behavior (it’s the match key). If you configuremergewithoutunique_key, it behaves likeappend(adapter fallback). [1], [2]- BigQuery note: for
incremental_strategy='merge', you must provideunique_key(otherwise behavior is ambiguous or can fail). [1]
- BigQuery note: for
Rule of thumb
- Use
appendwhen rows are immutable (event/log-style) and duplicates are acceptable/handled downstream. [2] - Use
mergewhen you need the target table to reflect the latest version per key (upserts). [2]
Sources: dbt docs on unique_key and incremental strategies. [1], [2]
Default macro returns none, causing non-Athena adapters to rely on adapter-specific defaults without explicit intent.
test_result_rows.sql (the only consumer of this macro) lacks a unique_key and is designed for append-only behavior. Returning none defers to adapter defaults—which vary (Snowflake and BigQuery typically default to merge)—rather than explicitly stating the append-only strategy. This creates implicit coupling to adapter behavior and ambiguity about incremental semantics.
Return "append" explicitly to match the macro's name and intent across all adapters:
🔧 Suggested fix
{% macro default__get_append_only_incremental_strategy() %}
- {% do return(none) %}
+ {% do return("append") %}
{% endmacro %}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {% macro get_append_only_incremental_strategy() %} | |
| {% do return(adapter.dispatch("get_append_only_incremental_strategy", "elementary")()) %} | |
| {% endmacro %} | |
| {%- macro athena__get_append_only_incremental_strategy() %} | |
| {% do return("append") %} | |
| {% endmacro %} | |
| {% macro default__get_append_only_incremental_strategy() %} | |
| {% do return(none) %} | |
| {% endmacro %} | |
| {% macro get_append_only_incremental_strategy() %} | |
| {% do return(adapter.dispatch("get_append_only_incremental_strategy", "elementary")()) %} | |
| {% endmacro %} | |
| {%- macro athena__get_append_only_incremental_strategy() %} | |
| {% do return("append") %} | |
| {% endmacro %} | |
| {% macro default__get_append_only_incremental_strategy() %} | |
| {% do return("append") %} | |
| {% endmacro %} |
🤖 Prompt for AI Agents
In `@macros/utils/cross_db_utils/incremental_strategy.sql` around lines 21 - 31,
The default implementation of get_append_only_incremental_strategy currently
returns none (default__get_append_only_incremental_strategy), which lets
non-Athena adapters fall back to their own defaults (e.g., merge) and causes
test_result_rows.sql to behave inconsistently; change
default__get_append_only_incremental_strategy to explicitly return the string
"append" so get_append_only_incremental_strategy (and
athena__get_append_only_incremental_strategy) uniformly indicate append-only
semantics across all adapters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually fine with this, as in practice we update this table via post-hook / on-run-end hook, and the model query itself is always empty.
I just need it not to fail - which is why I have the setting for Athena.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! The macro approach ensures Athena gets the explicit append strategy it needs while other adapters use their defaults. Ready to merge whenever you'd like - just note that the CI will continue to fail until merged due to the Python 3.9 → 3.10 chicken-and-egg issue with the reusable workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipped: comment is from another GitHub bot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm actually I have doubts, maybe I'll do as you say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haritamar Thanks for reconsidering! I think returning "append" explicitly is the safer choice here, even though the model query is currently empty. Here's why:
Benefits of explicit "append":
- Semantic clarity – The macro is named
get_append_only_incremental_strategy, so returning"append"matches its documented intent - Decouples from adapter defaults – You won't be implicitly relying on varied adapter behavior (Snowflake/BigQuery default to
merge, others may differ) - Future-proof – If the model query ever becomes non-empty or hooks change, the append-only semantics are already guaranteed
- Consistency – All adapters (including Athena) will explicitly use the same append-only strategy
Current risk with none:
- Redshift, Snowflake, and BigQuery default to
mergewhen strategy is unspecified - While they may fall back to
appendwhenunique_keyis missing, this fallback behavior isn't guaranteed across all adapter versions and can be ambiguous
The suggested fix in my previous comment explicitly returns "append" for the default implementation, which aligns the behavior across all adapters with the macro's intended purpose. Let me know if you'd like to proceed with that change!
🧠 Learnings used
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/utils/cross_db_utils/incremental_strategy.sql:13-15
Timestamp: 2025-10-05T16:25:01.719Z
Learning: In macros/utils/cross_db_utils/incremental_strategy.sql, the get_default_incremental_strategy() macro and its adapter-specific variants are used exclusively for Elementary's internal dbt artifact tables, all of which have unique keys defined. Therefore, defaulting to "merge" strategy for Redshift (and other adapters) is safe in this context.
Learnt from: GuyEshdat
Repo: elementary-data/dbt-data-reliability PR: 838
File: macros/utils/table_operations/delete_and_insert.sql:138-146
Timestamp: 2025-08-10T11:29:06.982Z
Learning: In the Elementary dbt-data-reliability codebase, the team prefers to add the `escape_reserved_keywords` macro only in places where they've actually encountered errors with reserved keywords, rather than preemptively adding it everywhere. They rely on E2E tests to catch future issues if reserved keywords cause problems in untested code paths. This pragmatic approach avoids unnecessary defensive programming and keeps the codebase simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the macro, but:
- Make it append for everything other than BigQuery (which doesn't support it)
- For BigQuery - merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Updated the macro:
- Default now returns
'append'for all adapters - BigQuery-specific override returns
'merge'(since BigQuery doesn't support append)
Pushed commit 9cd962a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipped: comment is from another GitHub bot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
BigQuery dbt incremental strategy append support
💡 Result:
Yes—BigQuery supports dbt’s append incremental strategy.
appendis the default incremental behavior when you don’t specify anincremental_strategy(and typically when you don’t setunique_key). dbt will simplyINSERT INTOthe target table. This does not update/delete existing rows and can create duplicates. [1], [2]- You can also explicitly set it:
{{ config(materialized='incremental', incremental_strategy='append') }}
select ...
{% if is_incremental() %}
where ...
{% endif %}- On BigQuery, if you want updates/upserts you generally use
merge(requiresunique_key), or partition-replacement strategies likeinsert_overwrite/microbatch(partition-based,unique_keynot required). [2], [3]
Sources:
[1] dbt docs: Incremental strategies (append) https://docs.getdbt.com/docs/build/incremental-strategy
[2] dbt docs: unique_key defaulting behavior and relationship to append https://docs.getdbt.com/reference/resource-configs/unique_key
[3] dbt-bigquery incremental strategy behavior overview (adapter implementation summary) https://deepwiki.com/dbt-labs/dbt-bigquery/3.2-incremental-materialization-strategies
@haritamar, I checked the dbt-bigquery documentation and found that BigQuery does support the append incremental strategy—it's actually the default behavior when no strategy is specified.
However, I want to clarify your requirements before implementing:
Potential concern: Setting incremental_strategy='merge' for BigQuery typically requires a unique_key, which test_result_rows.sql doesn't have. Without a unique_key, the behavior may fall back to append anyway, or could cause unexpected behavior.
Questions:
- Did you encounter a specific issue with
appendon BigQuery for this model? - Since the model query is empty (per your earlier comment), does the strategy choice actually affect BigQuery's behavior in practice?
If you still want to proceed with merge for BigQuery (perhaps for consistency with other BigQuery models or to match your conventions), I'm happy to implement it. Just want to make sure we're solving the right problem!
Would you like me to:
- A) Implement as requested:
"append"for default,"merge"for BigQuery - B) Use
"append"uniformly across all adapters (including BigQuery) - C) Something else?
🧠 Learnings used
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 825
File: macros/utils/cross_db_utils/incremental_strategy.sql:13-15
Timestamp: 2025-10-05T16:25:01.719Z
Learning: In macros/utils/cross_db_utils/incremental_strategy.sql, the get_default_incremental_strategy() macro and its adapter-specific variants are used exclusively for Elementary's internal dbt artifact tables, all of which have unique keys defined. Therefore, defaulting to "merge" strategy for Redshift (and other adapters) is safe in this context.
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 918
File: macros/utils/cross_db_utils/incremental_strategy.sql:21-31
Timestamp: 2026-01-27T21:01:09.183Z
Learning: In `models/edr/run_results/test_result_rows.sql`, the model query is always empty and the table is updated via post-hook/on-run-end hook. The `incremental_strategy` configuration using `get_append_only_incremental_strategy()` is only needed to prevent Athena from failing with its default `insert_overwrite` strategy. For other adapters, returning `none` (to use adapter defaults) is acceptable because the empty query doesn't trigger incremental logic.
Learnt from: GuyEshdat
Repo: elementary-data/dbt-data-reliability PR: 838
File: macros/utils/table_operations/delete_and_insert.sql:138-146
Timestamp: 2025-08-10T11:29:06.982Z
Learning: In the Elementary dbt-data-reliability codebase, the team prefers to add the `escape_reserved_keywords` macro only in places where they've actually encountered errors with reserved keywords, rather than preemptively adding it everywhere. They rely on E2E tests to catch future issues if reserved keywords cause problems in untested code paths. This pragmatic approach avoids unnecessary defensive programming and keeps the codebase simpler.
… all except BigQuery - Default now returns 'append' for all adapters - BigQuery-specific override returns 'merge' (BigQuery doesn't support append) - Removed Athena-specific override (now covered by default) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Summary
Changes the
incremental_strategyconfig intest_result_rows.sqlfromelementary.get_default_incremental_strategy()to an explicit'append'.This model's
unique_keywas removed in f5c062d (#908), but theincremental_strategywas left in place. On Redshift (and potentially other warehouses),get_default_incremental_strategy()returnsmerge, which fails without a unique_key because Redshift doesn't support MERGE statements with only a single WHEN clause:Updates since last revision
Initially removed the
incremental_strategyconfig entirely to let dbt use defaults. However, Athena's default incremental strategy isinsert_overwrite, which doesn't work for this append-only table. Now explicitly settingincremental_strategy='append'to ensure consistent behavior across all warehouse platforms.Also updated CI to use Python 3.10 (from 3.9) to fix package compatibility issues with
elementary-data.Review & Testing Checklist for Human
'append'is a valid incremental strategy for all supported warehouse adapters (Redshift, Snowflake, BigQuery, Databricks, Athena, Trino, ClickHouse, Dremio)test_result_rows(no deduplication needed since unique_key was intentionally removed)Notes
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.