Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions dbt/models/gold/gold_order_summary.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
-- Gold table: Order summary metrics by customer segment
with orders as (
select * from {{ ref('stg_orders') }}
),
customers as (
select * from {{ ref('stg_customers') }}
),
Comment on lines +5 to +7
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's read the actual file content to verify the review claims
cat -n dbt/models/gold/gold_order_summary.sql

Repository: zingleai/dataPortal_demo

Length of output: 1736


🏁 Script executed:

#!/bin/bash
# Search for stg_customers definition to understand the data structure
find . -type f -name "*stg_customers*" | head -20

Repository: zingleai/dataPortal_demo

Length of output: 428


🏁 Script executed:

#!/bin/bash
# Look for schema tests on stg_customers to check for uniqueness constraints
fd "schema" --type f -g "*.yml" -g "*.yaml" | xargs rg -l "stg_customers" 2>/dev/null | head -10

Repository: zingleai/dataPortal_demo

Length of output: 574


🏁 Script executed:

#!/bin/bash
# Read the actual gold_order_summary.sql file
cat -n dbt/models/gold/gold_order_summary.sql

Repository: zingleai/dataPortal_demo

Length of output: 865


🏁 Script executed:

#!/bin/bash
# Read the stg_customers model
cat -n dbt/models/staging/stg_customers.sql

Repository: zingleai/dataPortal_demo

Length of output: 547


🏁 Script executed:

#!/bin/bash
# Read the schema.yml which should define the unique test
cat -n dbt/models/staging/schema.yml

Repository: zingleai/dataPortal_demo

Length of output: 13065


Remove unnecessary customers CTE join or clarify its purpose

The customers CTE is joined but only c.customer_id is selected in the output, which is identical to o.customer_id under the join predicate. The columns available from stg_customers (first_name, last_name) are not included. This creates two issues:

  1. Silent data loss: The INNER JOIN silently drops any order rows whose customer_id doesn't exist in stg_customers. If there are orphaned orders (e.g., due to data quality issues), total_revenue and total_orders will be understated without warning.

  2. Unnecessary complexity: If customer enrichment columns are not intended, the join adds complexity without benefit. You can aggregate directly from orders:

Proposed fix — remove redundant join
-customers as (
-    select * from {{ ref('stg_customers') }}
-),
 final as (
     select
-        c.customer_id,
+        o.customer_id,
         count(distinct o.order_id) as total_orders,
         sum(o.amount) as total_revenue,
         avg(o.amount) as avg_order_value,
         min(o.order_date) as first_order_date,
         max(o.order_date) as last_order_date,
         datediff('day', min(o.order_date), max(o.order_date)) as customer_lifetime_days
-    from orders o
-    join customers c on o.customer_id = c.customer_id
-    group by 1
+    from orders o
+    group by o.customer_id
 )

If customer enrichment columns (first_name, last_name) are intended, add them to the output and document the enrichment intent in a comment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/gold/gold_order_summary.sql` around lines 5 - 7, The customers CTE
(customers and {{ ref('stg_customers') }}) is redundant and causes silent
row-dropping because you INNER JOIN to it but only select c.customer_id (which
equals o.customer_id); either remove the customers CTE and its join and
aggregate directly from the orders source (e.g., use orders/order_summary CTEs)
or, if you intend customer enrichment, keep customers, include enrichment
columns (e.g., first_name, last_name) in the SELECT, and convert the INNER JOIN
to a LEFT JOIN to avoid losing orders for missing customers; also add a brief
comment explaining the enrichment intent if you retain the join.

final as (
select
c.customer_id,
count(distinct o.order_id) as total_orders,
sum(o.amount) as total_revenue,
avg(o.amount) as avg_order_value,
min(o.order_date) as first_order_date,
max(o.order_date) as last_order_date,
datediff('day', min(o.order_date), max(o.order_date)) as customer_lifetime_days
from orders o
join customers c on o.customer_id = c.customer_id
group by 1
)
select * from final