-
Notifications
You must be signed in to change notification settings - Fork 123
Fix breaking changes in recent dbt-fusion version #920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
📝 WalkthroughWalkthroughThe PR upgrades dbt-fusion from version 2.0.0-preview.76 to 2.0.0-preview.102 by removing the version pin in CI and updating the version file. It addresses breaking changes through defensive code handling, test skip decorators for incompatible tests, and expanded Spark timestamp type support. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…rsion Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
- Update Python version from 3.9 to 3.10 (required by elementary-data package) - Remove .dbt-fusion-version file and install latest dbt-fusion directly - This fixes CI failures caused by elementary dropping Python 3.9 support Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
The workflow runs from master branch which still expects this file. This updates the version from 2.0.0-preview.76 to 2.0.0-preview.102. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
e25e57d to
bbc4104
Compare
dbt-fusion fails with 'none has no method named items' when ignore_small_changes is None. This fix: 1. Normalizes ignore_small_changes to a dict with expected keys when None 2. Adds early return in validate_ignore_small_changes when None Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@macros/edr/dbt_artifacts/upload_dbt_invocation.sql`:
- Around line 105-131: The logs in get_invocation_yaml_selector currently print
full invocation_args_dict and config.args (via invocation_args_dict and
config.args) at info level, which may expose secrets; change these log calls to
use debug-level only and redact sensitive fields (e.g., --vars, tokens,
passwords, emails, ENV-derived secrets) before logging by either omitting those
keys or replacing their values with "[REDACTED]"; keep all other debug messages
(e.g., when returning selector, selector_name, or INVOCATION_COMMAND matches)
intact and ensure logging uses the same debug/info flagging as the surrounding
module.
In `@macros/edr/materializations/test/test.sql`:
- Around line 54-59: The code currently logs the entire flattened_test.meta at
info level; change it to avoid exposing PII by removing or replacing the full
meta log and instead log only the specific key needed
(flattened_test["meta"]["test_sample_row_count"] or flattened_test.get("meta",
{}).get("test_sample_row_count")) at debug level; update the {% do log(...) %}
call that references flattened_test.get("meta", {}) to log only the single key
and use info=False (debug) so only the necessary, non-sensitive value is
emitted, leaving the sample_limit assignment and subsequent debug log intact.
🧹 Nitpick comments (1)
macros/edr/data_monitoring/schema_changes/get_columns_snapshot_query.sql (1)
15-25: Gate verbose column logging behind the debug logger.Line 15–25 logs every column at info level, which can be noisy and potentially expose schema details in normal runs. Prefer
elementary.debug_log(or a debug flag) to keep this diagnostic-only.♻️ Proposed change
- {% do log('DEBUG get_columns_snapshot_query: model_relation = ' ~ model_relation, info=True) %} - {% do log('DEBUG get_columns_snapshot_query: full_table_name = ' ~ full_table_name, info=True) %} + {% do elementary.debug_log('get_columns_snapshot_query: model_relation = ' ~ model_relation) %} + {% do elementary.debug_log('get_columns_snapshot_query: full_table_name = ' ~ full_table_name) %} ... - {% do log('DEBUG get_columns_snapshot_query: columns count = ' ~ columns | length, info=True) %} + {% do elementary.debug_log('get_columns_snapshot_query: columns count = ' ~ columns | length) %} {% for column in columns %} - {% do log('DEBUG get_columns_snapshot_query: column[' ~ loop.index ~ '] = ' ~ column.name ~ ' (' ~ column.data_type ~ ')', info=True) %} + {% do elementary.debug_log('get_columns_snapshot_query: column[' ~ loop.index ~ '] = ' ~ column.name ~ ' (' ~ column.data_type ~ ')') %} {% endfor %}
| {# DEBUG: Log sample_limit determination #} | ||
| {% do log('DEBUG handle_dbt_test: initial sample_limit from config = ' ~ sample_limit, info=True) %} | ||
| {% do log('DEBUG handle_dbt_test: flattened_test meta = ' ~ flattened_test.get("meta", {}), info=True) %} | ||
| {% if "meta" in flattened_test and "test_sample_row_count" in flattened_test["meta"] %} | ||
| {% set sample_limit = flattened_test["meta"]["test_sample_row_count"] %} | ||
| {% do log('DEBUG handle_dbt_test: sample_limit from meta = ' ~ sample_limit, info=True) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t log full flattened_test.meta at info level.
Line 54–57 emits the entire meta dict, which can include user-supplied PII/secrets. Please log only the specific key(s) needed and use the debug logger.
🔒 Proposed change (redacted + debug-only)
- {% do log('DEBUG handle_dbt_test: initial sample_limit from config = ' ~ sample_limit, info=True) %}
- {% do log('DEBUG handle_dbt_test: flattened_test meta = ' ~ flattened_test.get("meta", {}), info=True) %}
+ {% do elementary.debug_log('handle_dbt_test: initial sample_limit from config = ' ~ sample_limit) %}
+ {% do elementary.debug_log('handle_dbt_test: meta keys = ' ~ (flattened_test.get("meta", {}).keys() | list)) %}
...
- {% do log('DEBUG handle_dbt_test: sample_limit from meta = ' ~ sample_limit, info=True) %}
+ {% do elementary.debug_log('handle_dbt_test: sample_limit from meta = ' ~ sample_limit) %}🤖 Prompt for AI Agents
In `@macros/edr/materializations/test/test.sql` around lines 54 - 59, The code
currently logs the entire flattened_test.meta at info level; change it to avoid
exposing PII by removing or replacing the full meta log and instead log only the
specific key needed (flattened_test["meta"]["test_sample_row_count"] or
flattened_test.get("meta", {}).get("test_sample_row_count")) at debug level;
update the {% do log(...) %} call that references flattened_test.get("meta", {})
to log only the single key and use info=False (debug) so only the necessary,
non-sensitive value is emitted, leaving the sample_limit assignment and
subsequent debug log intact.
- Add skip_for_dbt_fusion marker to test_schema_changes (dbt-fusion caches column info) - Add skip_for_dbt_fusion marker to test_dbt_invocations (invocation_args_dict is empty) - Add skip_for_dbt_fusion marker to test_sample_count_unlimited (test meta not passed through) - Remove debug logs from get_columns_snapshot_query.sql - Remove debug logs from upload_dbt_invocation.sql - Remove debug logs from test.sql Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…timestamp issue Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Summary
This PR addresses breaking changes introduced in dbt-fusion versions after
2.0.0-preview.76.Changes:
.dbt-fusion-versionfrom2.0.0-preview.76to2.0.0-preview.102.github/workflows/test-warehouse.ymlto install latest dbt-fusion without reading from version fileignore_small_changesparameter handling inget_anomalies_test_configuration.sql- normalized to dict when None to prevent.items()call on Noneskip_for_dbt_fusionmarkers to tests with known dbt-fusion conformance issues:test_schema_changes- dbt-fusion caches column information and doesn't refresh when tables are recreatedtest_dbt_invocations-invocation_args_dictis empty in dbt-fusion, so yaml_selector cannot be capturedtest_sample_count_unlimited- test meta configuration not passed through to flattened_testReview & Testing Checklist for Human
test_event_freshness_anomaliestests are failing withdbt1501: Column 'event_timestamp' is not a timestamperror. This is an additional dbt-fusion breaking change that needs investigation.ignore_small_changesnormalization fix - verify the default dict structure is correctskip_for_dbt_fusionmarkers are appropriate - these are dbt-fusion conformance issues, not bugs in this codebaseNotes
The skipped tests represent dbt-fusion conformance issues that should be reported upstream:
invocation_args_dictis empty (should contain CLI args)Status: CI is failing on
test_event_freshness_anomaliestests. Additional investigation needed for the timestamp column error in dbt-fusion.Link to Devin run: https://app.devin.ai/sessions/91780843af9a49208c0c87ba18f4682b
Requested by: Itamar Hartstein (@haritamar)
Summary by CodeRabbit
Version Update
Bug Fixes
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.