Skip to content

Conversation

@cedric-orange
Copy link

@cedric-orange cedric-orange commented Mar 14, 2025

When you are using some tools like dbt-loom, graph.nodes.values() returns nodes from multiples project.

dbt-loom works by fetching public model definitions from your dbt artifacts, and injecting those models into your dbt project.

The goal of this PR is to only retrieve nodes from the current project using project_name selection to avoid uploading of unnecessary information.

Summary by CodeRabbit

  • Bug Fixes
    • Improved artifact filtering to ensure all dbt artifacts (models, seeds, snapshots, sources, tests, columns, and exposures) are properly scoped to the current project during upload operations, preventing cross-project artifact mixing.

✏️ Tip: You can customize this high-level summary in your review settings.

@michael-myaskovsky
Copy link
Contributor

Thanks for your contribution!
Please add a flag for this behavior, after that I'll be able to merge.

@cedric-orange
Copy link
Author

I haven't seen in contributing guide a section about flag. Could you explain to me what is required ?

@michael-myaskovsky
Copy link
Contributor

Since it changes the behavior, we want it to be an option that the user actively chooses. Either by configuration, or by an argument that the program receives. With the default being set to the original behavior.

@cedric-orange
Copy link
Author

This does not change current behavior.
But this protects against duplicates that are populated in the database when dbt-loom is used.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for too long with no activity.
If you would like the pull request to remain open, please remove the stale label or leave a comment.

@github-actions github-actions bot added the Stale label Nov 18, 2025
Copy link
Contributor

@elazarlachkar elazarlachkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cedric-orange!

I'm sorry for the delayed response.
Your PR looks good to me and I would like to merge it, I just commented with one question.
Could you look at it, and also update your branch to the latest elementary-data:master for the CI to pass?

Also, is that relevant also to groups and metrics artifacts?

@github-actions github-actions bot removed the Stale label Nov 21, 2025
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for too long with no activity.
If you would like the pull request to remain open, please remove the stale label or leave a comment.

@github-actions github-actions bot added the Stale label Jan 20, 2026
Copy link
Contributor

@elazarlachkar elazarlachkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cedric-orange very small comments

@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

📝 Walkthrough

Walkthrough

Added package_name == project_name filtering across seven dbt artifact upload macros in the EDR module. This restricts artifact uploads to include only resources belonging to the current project, excluding external packages from the manifest graph.

Changes

Cohort / File(s) Summary
Package-scoped artifact filtering
macros/edr/dbt_artifacts/upload_dbt_columns.sql, upload_dbt_exposures.sql, upload_dbt_models.sql, upload_dbt_seeds.sql, upload_dbt_snapshots.sql, upload_dbt_sources.sql, upload_dbt_tests.sql
Added package_name == project_name filter condition to each artifact type's selection logic, narrowing upload scope to project-specific resources only. Consistent pattern applied across all upload macros.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A filter's touch, so light and neat,
Seven files march in sync, keeping step and beat,
Package names aligned with project pride,
Only local treasures come along for the ride!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change across all modified files: filtering dbt artifacts (models, columns, exposures, seeds, snapshots, sources, and tests) to only upload items from the current project.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@macros/edr/dbt_artifacts/upload_dbt_exposures.sql`:
- Line 4: The current filter on graph.exposures in the upload_dbt_exposures
macro (exposures = graph.exposures.values() | selectattr('resource_type', '==',
'exposure') | selectattr('package_name', '==', project_name)) force-scopes
exposures to the current project and must be made opt-in; add a boolean
config/flag (e.g., a macro arg or dbt var like scope_exposures_to_project
defaulting to false) and only apply the selectattr('package_name', '==',
project_name') filter when that flag is true, leaving the original exposure
selection by resource_type unchanged when the flag is false so existing behavior
remains intact.

In `@macros/edr/dbt_artifacts/upload_dbt_models.sql`:
- Line 4: The current filtering expression that sets models via
graph.nodes.values() | selectattr('resource_type','==','model') |
selectattr('package_name','==', project_name) forces project-only scoping and
must be made opt-in; add a config flag using
elementary.get_config_var('edr_upload_scope_project', default=false) and
conditionally apply the package_name == project_name selectattr only when that
flag is true (update upload_dbt_models.sql and apply the same guard in the other
artifact upload macros), keeping the default behavior to include dependency
packages.

In `@macros/edr/dbt_artifacts/upload_dbt_seeds.sql`:
- Line 4: The change currently filters seeds by package_name unconditionally;
make this project-scoping behavior opt-in by adding a boolean flag (e.g.,
scope_seeds_to_project defaulting to false) and apply the package_name filter
only when that flag is true: modify the seeds jinja expression in
upload_dbt_seeds.sql to conditionally include "| selectattr('package_name',
'==', project_name)" when scope_seeds_to_project is true (keep the original
behavior when false), and ensure the flag is documented/defined where macro
variables are declared so callers can opt in.

In `@macros/edr/dbt_artifacts/upload_dbt_snapshots.sql`:
- Line 4: The current assignment to snapshots filters out package snapshots by
comparing package_name == project_name; make this scoping opt-in by adding a
macro parameter or config flag (e.g., scope_to_project defaulting to false) and
apply the selectattr('package_name','==', project_name) filter only when that
flag is true; update the snapshots variable construction (the line that sets
snapshots) to conditionally include the package_name filter and document the new
parameter.

In `@macros/edr/dbt_artifacts/upload_dbt_sources.sql`:
- Line 4: The change to how sources is computed now filters by package_name ==
project_name and must be made opt-in: add a config flag (e.g., var or macro
argument like scope_project_sources defaulting to false) and only apply the
selectattr('package_name', '==', project_name) filter when that flag is true,
otherwise keep the original behavior (do not filter by package_name); update the
computation of the sources variable (the expression that uses
graph.sources.values() and selectattr('resource_type', '==', 'source')) to
conditionally append the package_name filter based on the new flag and document
the new flag in the macro signature/usage.

In `@macros/edr/dbt_artifacts/upload_dbt_tests.sql`:
- Line 4: The current line that defines tests by filtering graph.nodes to
resource_type == 'test' and package_name == project_name introduces a behavior
change by excluding dependency tests; make this scoping opt-in by adding a
configuration flag (e.g., scope_tests_to_project defaulting to false) and only
apply the package_name == project_name filter when that flag is true; update the
tests set definition (the {% set tests = ... %} expression) to conditionally
include the package_name filter based on that flag and ensure the same opt-in
guard is added wherever artifact uploads perform project-scoped filtering so all
artifact uploads behave consistently.

{% set relation = elementary.get_elementary_relation('dbt_exposures') %}
{% if execute and relation %}
{% set exposures = graph.exposures.values() | selectattr('resource_type', '==', 'exposure') %}
{% set exposures = graph.exposures.values() | selectattr('resource_type', '==', 'exposure') | selectattr('package_name', '==', project_name) %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make project scoping opt‑in (behavior change).

Line 4 drops exposures from packages; this should be behind a config/flag (default off) to avoid changing behavior for existing users.

🤖 Prompt for AI Agents
In `@macros/edr/dbt_artifacts/upload_dbt_exposures.sql` at line 4, The current
filter on graph.exposures in the upload_dbt_exposures macro (exposures =
graph.exposures.values() | selectattr('resource_type', '==', 'exposure') |
selectattr('package_name', '==', project_name)) force-scopes exposures to the
current project and must be made opt-in; add a boolean config/flag (e.g., a
macro arg or dbt var like scope_exposures_to_project defaulting to false) and
only apply the selectattr('package_name', '==', project_name') filter when that
flag is true, leaving the original exposure selection by resource_type unchanged
when the flag is false so existing behavior remains intact.

{% set relation = elementary.get_elementary_relation('dbt_models') %}
{% if execute and relation %}
{% set models = graph.nodes.values() | selectattr('resource_type', '==', 'model') %}
{% set models = graph.nodes.values() | selectattr('resource_type', '==', 'model' ) | selectattr('package_name', '==', project_name) %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make project scoping opt‑in (behavior change).

Line 4 excludes models from packages/dependencies. This is a behavioral change and should be gated behind a config/flag with a default that keeps current behavior. You can use the existing elementary.get_config_var(...) pattern in this file and apply the same guard to all artifact uploads.

💡 Suggested pattern (apply across all upload macros)
-{% set models = graph.nodes.values() | selectattr('resource_type', '==', 'model' ) | selectattr('package_name', '==', project_name) %}
+{% set models = graph.nodes.values() | selectattr('resource_type', '==', 'model') %}
+{% if elementary.get_config_var('limit_artifacts_to_current_project') %}
+  {% set models = models | selectattr('package_name', '==', project_name) %}
+{% endif %}
🤖 Prompt for AI Agents
In `@macros/edr/dbt_artifacts/upload_dbt_models.sql` at line 4, The current
filtering expression that sets models via graph.nodes.values() |
selectattr('resource_type','==','model') | selectattr('package_name','==',
project_name) forces project-only scoping and must be made opt-in; add a config
flag using elementary.get_config_var('edr_upload_scope_project', default=false)
and conditionally apply the package_name == project_name selectattr only when
that flag is true (update upload_dbt_models.sql and apply the same guard in the
other artifact upload macros), keeping the default behavior to include
dependency packages.

{% set relation = elementary.get_elementary_relation('dbt_seeds') %}
{% if execute and relation %}
{% set seeds = graph.nodes.values() | selectattr('resource_type', '==', 'seed') %}
{% set seeds = graph.nodes.values() | selectattr('resource_type', '==', 'seed') | selectattr('package_name', '==', project_name) %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make project scoping opt‑in (behavior change).

Line 4 now excludes package seeds; this is a behavioral change. Please guard it behind an explicit flag (default off to preserve current behavior), matching the requested PR approach.

🤖 Prompt for AI Agents
In `@macros/edr/dbt_artifacts/upload_dbt_seeds.sql` at line 4, The change
currently filters seeds by package_name unconditionally; make this
project-scoping behavior opt-in by adding a boolean flag (e.g.,
scope_seeds_to_project defaulting to false) and apply the package_name filter
only when that flag is true: modify the seeds jinja expression in
upload_dbt_seeds.sql to conditionally include "| selectattr('package_name',
'==', project_name)" when scope_seeds_to_project is true (keep the original
behavior when false), and ensure the flag is documented/defined where macro
variables are declared so callers can opt in.

{% set relation = elementary.get_elementary_relation('dbt_snapshots') %}
{% if execute and relation %}
{% set snapshots = graph.nodes.values() | selectattr('resource_type', '==', 'snapshot') %}
{% set snapshots = graph.nodes.values() | selectattr('resource_type', '==', 'snapshot') | selectattr('package_name', '==', project_name) %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make project scoping opt‑in (behavior change).

Line 4 excludes package snapshots; this is a behavior change and should be guarded behind a flag with default preserving current behavior.

🤖 Prompt for AI Agents
In `@macros/edr/dbt_artifacts/upload_dbt_snapshots.sql` at line 4, The current
assignment to snapshots filters out package snapshots by comparing package_name
== project_name; make this scoping opt-in by adding a macro parameter or config
flag (e.g., scope_to_project defaulting to false) and apply the
selectattr('package_name','==', project_name) filter only when that flag is
true; update the snapshots variable construction (the line that sets snapshots)
to conditionally include the package_name filter and document the new parameter.

{% set relation = elementary.get_elementary_relation('dbt_sources') %}
{% if execute and relation %}
{% set sources = graph.sources.values() | selectattr('resource_type', '==', 'source') %}
{% set sources = graph.sources.values() | selectattr('resource_type', '==', 'source') | selectattr('package_name', '==', project_name) %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make project scoping opt‑in (behavior change).

Line 4 now excludes package sources; please gate this behind an explicit configuration flag (default off) to preserve existing behavior.

🤖 Prompt for AI Agents
In `@macros/edr/dbt_artifacts/upload_dbt_sources.sql` at line 4, The change to how
sources is computed now filters by package_name == project_name and must be made
opt-in: add a config flag (e.g., var or macro argument like
scope_project_sources defaulting to false) and only apply the
selectattr('package_name', '==', project_name) filter when that flag is true,
otherwise keep the original behavior (do not filter by package_name); update the
computation of the sources variable (the expression that uses
graph.sources.values() and selectattr('resource_type', '==', 'source')) to
conditionally append the package_name filter based on the new flag and document
the new flag in the macro signature/usage.

{% set relation = elementary.get_elementary_relation('dbt_tests') %}
{% if execute and relation %}
{% set tests = graph.nodes.values() | selectattr('resource_type', '==', 'test') %}
{% set tests = graph.nodes.values() | selectattr('resource_type', '==', 'test') | selectattr('package_name', '==', project_name) %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make project scoping opt‑in (behavior change).

Line 4 filters out tests from packages/dependencies. That’s a behavior change; per PR discussion this should be gated by a config/flag with a default that preserves current behavior. Please add an opt‑in guard and apply consistently across all artifact uploads.

🤖 Prompt for AI Agents
In `@macros/edr/dbt_artifacts/upload_dbt_tests.sql` at line 4, The current line
that defines tests by filtering graph.nodes to resource_type == 'test' and
package_name == project_name introduces a behavior change by excluding
dependency tests; make this scoping opt-in by adding a configuration flag (e.g.,
scope_tests_to_project defaulting to false) and only apply the package_name ==
project_name filter when that flag is true; update the tests set definition (the
{% set tests = ... %} expression) to conditionally include the package_name
filter based on that flag and ensure the same opt-in guard is added wherever
artifact uploads perform project-scoped filtering so all artifact uploads behave
consistently.

@elazarlachkar
Copy link
Contributor

@cedric-orange thanks for resolving my comments.

I now realised this PR changes the behavior a bit, as currently the artifacts include also artifacts from dbt package dependencies (for example, if you use dbt_project_evaluator as a dependency, its models will be included).
I'll check if this behavior change is Ok with out Product team.

If this change is not Ok, we'll need to disable your feature by default, and add a flag which enables it.
I'm thinking of a generic macro which filters "entities" from the Graph. This macro can check for this flag, and be called by the different "upload_x_artifacts". Let me know what you think.

@github-actions github-actions bot removed the Stale label Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants