-
Notifications
You must be signed in to change notification settings - Fork 123
fix: filter dbt models to upload only current project #793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
|
I haven't seen in contributing guide a section about flag. Could you explain to me what is required ? |
|
Since it changes the behavior, we want it to be an option that the user actively chooses. Either by configuration, or by an argument that the program receives. With the default being set to the original behavior. |
|
This does not change current behavior. |
|
This pull request is stale because it has been open for too long with no activity. |
elazarlachkar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cedric-orange!
I'm sorry for the delayed response.
Your PR looks good to me and I would like to merge it, I just commented with one question.
Could you look at it, and also update your branch to the latest elementary-data:master for the CI to pass?
Also, is that relevant also to groups and metrics artifacts?
|
This pull request is stale because it has been open for too long with no activity. |
elazarlachkar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cedric-orange very small comments
📝 WalkthroughWalkthroughAdded Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@macros/edr/dbt_artifacts/upload_dbt_exposures.sql`:
- Line 4: The current filter on graph.exposures in the upload_dbt_exposures
macro (exposures = graph.exposures.values() | selectattr('resource_type', '==',
'exposure') | selectattr('package_name', '==', project_name)) force-scopes
exposures to the current project and must be made opt-in; add a boolean
config/flag (e.g., a macro arg or dbt var like scope_exposures_to_project
defaulting to false) and only apply the selectattr('package_name', '==',
project_name') filter when that flag is true, leaving the original exposure
selection by resource_type unchanged when the flag is false so existing behavior
remains intact.
In `@macros/edr/dbt_artifacts/upload_dbt_models.sql`:
- Line 4: The current filtering expression that sets models via
graph.nodes.values() | selectattr('resource_type','==','model') |
selectattr('package_name','==', project_name) forces project-only scoping and
must be made opt-in; add a config flag using
elementary.get_config_var('edr_upload_scope_project', default=false) and
conditionally apply the package_name == project_name selectattr only when that
flag is true (update upload_dbt_models.sql and apply the same guard in the other
artifact upload macros), keeping the default behavior to include dependency
packages.
In `@macros/edr/dbt_artifacts/upload_dbt_seeds.sql`:
- Line 4: The change currently filters seeds by package_name unconditionally;
make this project-scoping behavior opt-in by adding a boolean flag (e.g.,
scope_seeds_to_project defaulting to false) and apply the package_name filter
only when that flag is true: modify the seeds jinja expression in
upload_dbt_seeds.sql to conditionally include "| selectattr('package_name',
'==', project_name)" when scope_seeds_to_project is true (keep the original
behavior when false), and ensure the flag is documented/defined where macro
variables are declared so callers can opt in.
In `@macros/edr/dbt_artifacts/upload_dbt_snapshots.sql`:
- Line 4: The current assignment to snapshots filters out package snapshots by
comparing package_name == project_name; make this scoping opt-in by adding a
macro parameter or config flag (e.g., scope_to_project defaulting to false) and
apply the selectattr('package_name','==', project_name) filter only when that
flag is true; update the snapshots variable construction (the line that sets
snapshots) to conditionally include the package_name filter and document the new
parameter.
In `@macros/edr/dbt_artifacts/upload_dbt_sources.sql`:
- Line 4: The change to how sources is computed now filters by package_name ==
project_name and must be made opt-in: add a config flag (e.g., var or macro
argument like scope_project_sources defaulting to false) and only apply the
selectattr('package_name', '==', project_name) filter when that flag is true,
otherwise keep the original behavior (do not filter by package_name); update the
computation of the sources variable (the expression that uses
graph.sources.values() and selectattr('resource_type', '==', 'source')) to
conditionally append the package_name filter based on the new flag and document
the new flag in the macro signature/usage.
In `@macros/edr/dbt_artifacts/upload_dbt_tests.sql`:
- Line 4: The current line that defines tests by filtering graph.nodes to
resource_type == 'test' and package_name == project_name introduces a behavior
change by excluding dependency tests; make this scoping opt-in by adding a
configuration flag (e.g., scope_tests_to_project defaulting to false) and only
apply the package_name == project_name filter when that flag is true; update the
tests set definition (the {% set tests = ... %} expression) to conditionally
include the package_name filter based on that flag and ensure the same opt-in
guard is added wherever artifact uploads perform project-scoped filtering so all
artifact uploads behave consistently.
| {% set relation = elementary.get_elementary_relation('dbt_exposures') %} | ||
| {% if execute and relation %} | ||
| {% set exposures = graph.exposures.values() | selectattr('resource_type', '==', 'exposure') %} | ||
| {% set exposures = graph.exposures.values() | selectattr('resource_type', '==', 'exposure') | selectattr('package_name', '==', project_name) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make project scoping opt‑in (behavior change).
Line 4 drops exposures from packages; this should be behind a config/flag (default off) to avoid changing behavior for existing users.
🤖 Prompt for AI Agents
In `@macros/edr/dbt_artifacts/upload_dbt_exposures.sql` at line 4, The current
filter on graph.exposures in the upload_dbt_exposures macro (exposures =
graph.exposures.values() | selectattr('resource_type', '==', 'exposure') |
selectattr('package_name', '==', project_name)) force-scopes exposures to the
current project and must be made opt-in; add a boolean config/flag (e.g., a
macro arg or dbt var like scope_exposures_to_project defaulting to false) and
only apply the selectattr('package_name', '==', project_name') filter when that
flag is true, leaving the original exposure selection by resource_type unchanged
when the flag is false so existing behavior remains intact.
| {% set relation = elementary.get_elementary_relation('dbt_models') %} | ||
| {% if execute and relation %} | ||
| {% set models = graph.nodes.values() | selectattr('resource_type', '==', 'model') %} | ||
| {% set models = graph.nodes.values() | selectattr('resource_type', '==', 'model' ) | selectattr('package_name', '==', project_name) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make project scoping opt‑in (behavior change).
Line 4 excludes models from packages/dependencies. This is a behavioral change and should be gated behind a config/flag with a default that keeps current behavior. You can use the existing elementary.get_config_var(...) pattern in this file and apply the same guard to all artifact uploads.
💡 Suggested pattern (apply across all upload macros)
-{% set models = graph.nodes.values() | selectattr('resource_type', '==', 'model' ) | selectattr('package_name', '==', project_name) %}
+{% set models = graph.nodes.values() | selectattr('resource_type', '==', 'model') %}
+{% if elementary.get_config_var('limit_artifacts_to_current_project') %}
+ {% set models = models | selectattr('package_name', '==', project_name) %}
+{% endif %}🤖 Prompt for AI Agents
In `@macros/edr/dbt_artifacts/upload_dbt_models.sql` at line 4, The current
filtering expression that sets models via graph.nodes.values() |
selectattr('resource_type','==','model') | selectattr('package_name','==',
project_name) forces project-only scoping and must be made opt-in; add a config
flag using elementary.get_config_var('edr_upload_scope_project', default=false)
and conditionally apply the package_name == project_name selectattr only when
that flag is true (update upload_dbt_models.sql and apply the same guard in the
other artifact upload macros), keeping the default behavior to include
dependency packages.
| {% set relation = elementary.get_elementary_relation('dbt_seeds') %} | ||
| {% if execute and relation %} | ||
| {% set seeds = graph.nodes.values() | selectattr('resource_type', '==', 'seed') %} | ||
| {% set seeds = graph.nodes.values() | selectattr('resource_type', '==', 'seed') | selectattr('package_name', '==', project_name) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make project scoping opt‑in (behavior change).
Line 4 now excludes package seeds; this is a behavioral change. Please guard it behind an explicit flag (default off to preserve current behavior), matching the requested PR approach.
🤖 Prompt for AI Agents
In `@macros/edr/dbt_artifacts/upload_dbt_seeds.sql` at line 4, The change
currently filters seeds by package_name unconditionally; make this
project-scoping behavior opt-in by adding a boolean flag (e.g.,
scope_seeds_to_project defaulting to false) and apply the package_name filter
only when that flag is true: modify the seeds jinja expression in
upload_dbt_seeds.sql to conditionally include "| selectattr('package_name',
'==', project_name)" when scope_seeds_to_project is true (keep the original
behavior when false), and ensure the flag is documented/defined where macro
variables are declared so callers can opt in.
| {% set relation = elementary.get_elementary_relation('dbt_snapshots') %} | ||
| {% if execute and relation %} | ||
| {% set snapshots = graph.nodes.values() | selectattr('resource_type', '==', 'snapshot') %} | ||
| {% set snapshots = graph.nodes.values() | selectattr('resource_type', '==', 'snapshot') | selectattr('package_name', '==', project_name) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make project scoping opt‑in (behavior change).
Line 4 excludes package snapshots; this is a behavior change and should be guarded behind a flag with default preserving current behavior.
🤖 Prompt for AI Agents
In `@macros/edr/dbt_artifacts/upload_dbt_snapshots.sql` at line 4, The current
assignment to snapshots filters out package snapshots by comparing package_name
== project_name; make this scoping opt-in by adding a macro parameter or config
flag (e.g., scope_to_project defaulting to false) and apply the
selectattr('package_name','==', project_name) filter only when that flag is
true; update the snapshots variable construction (the line that sets snapshots)
to conditionally include the package_name filter and document the new parameter.
| {% set relation = elementary.get_elementary_relation('dbt_sources') %} | ||
| {% if execute and relation %} | ||
| {% set sources = graph.sources.values() | selectattr('resource_type', '==', 'source') %} | ||
| {% set sources = graph.sources.values() | selectattr('resource_type', '==', 'source') | selectattr('package_name', '==', project_name) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make project scoping opt‑in (behavior change).
Line 4 now excludes package sources; please gate this behind an explicit configuration flag (default off) to preserve existing behavior.
🤖 Prompt for AI Agents
In `@macros/edr/dbt_artifacts/upload_dbt_sources.sql` at line 4, The change to how
sources is computed now filters by package_name == project_name and must be made
opt-in: add a config flag (e.g., var or macro argument like
scope_project_sources defaulting to false) and only apply the
selectattr('package_name', '==', project_name) filter when that flag is true,
otherwise keep the original behavior (do not filter by package_name); update the
computation of the sources variable (the expression that uses
graph.sources.values() and selectattr('resource_type', '==', 'source')) to
conditionally append the package_name filter based on the new flag and document
the new flag in the macro signature/usage.
| {% set relation = elementary.get_elementary_relation('dbt_tests') %} | ||
| {% if execute and relation %} | ||
| {% set tests = graph.nodes.values() | selectattr('resource_type', '==', 'test') %} | ||
| {% set tests = graph.nodes.values() | selectattr('resource_type', '==', 'test') | selectattr('package_name', '==', project_name) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make project scoping opt‑in (behavior change).
Line 4 filters out tests from packages/dependencies. That’s a behavior change; per PR discussion this should be gated by a config/flag with a default that preserves current behavior. Please add an opt‑in guard and apply consistently across all artifact uploads.
🤖 Prompt for AI Agents
In `@macros/edr/dbt_artifacts/upload_dbt_tests.sql` at line 4, The current line
that defines tests by filtering graph.nodes to resource_type == 'test' and
package_name == project_name introduces a behavior change by excluding
dependency tests; make this scoping opt-in by adding a configuration flag (e.g.,
scope_tests_to_project defaulting to false) and only apply the package_name ==
project_name filter when that flag is true; update the tests set definition (the
{% set tests = ... %} expression) to conditionally include the package_name
filter based on that flag and ensure the same opt-in guard is added wherever
artifact uploads perform project-scoped filtering so all artifact uploads behave
consistently.
|
@cedric-orange thanks for resolving my comments. I now realised this PR changes the behavior a bit, as currently the artifacts include also artifacts from dbt package dependencies (for example, if you use dbt_project_evaluator as a dependency, its models will be included). If this change is not Ok, we'll need to disable your feature by default, and add a flag which enables it. |
When you are using some tools like dbt-loom,
graph.nodes.values()returns nodes from multiples project.dbt-loom works by fetching public model definitions from your dbt artifacts, and injecting those models into your dbt project.
The goal of this PR is to only retrieve nodes from the current project using project_name selection to avoid uploading of unnecessary information.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.