Skip to content

[E2E Test] New model: gold_order_summary#8

Open
zingleai wants to merge 1 commit intomainfrom
dp/newmodel
Open

[E2E Test] New model: gold_order_summary#8
zingleai wants to merge 1 commit intomainfrom
dp/newmodel

Conversation

@zingleai
Copy link
Copy Markdown
Owner

@zingleai zingleai commented Feb 23, 2026

Summary

Order summary metrics by customer — total orders, revenue, avg order value, lifetime days.

Tags: orders, revenue, gold
Criticality: P1

Models (1)

Model Layer Materialization Columns Upstream
gold_order_summary gold table 7 stg_orders, stg_customers

Lineage

graph LR; stg_orders --> gold_order_summary; stg_customers --> gold_order_summary
Loading

Data Quality Tests (2)

  • [gold_order_summary] not_null_customer_id: Ensure customer_id is never null
  • [gold_order_summary] positive_revenue: Revenue must be non-negative

Generated by Data Portal

Summary by CodeRabbit

  • New Features
    • Added customer order summary analytics providing insights into total orders, total revenue, average order value, and customer lifetime metrics per customer account.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

A new dbt gold model is introduced that aggregates order data by customer, computing metrics including total orders, total revenue, average order value, order date ranges, and customer lifetime days from staging tables.

Changes

Cohort / File(s) Summary
New Gold Model
dbt/models/gold/gold_order_summary.sql
Introduces order summary aggregations per customer, computing distinct order counts, revenue totals, averages, and date-based metrics from stg_orders and stg_customers staging sources.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A golden table now takes its place,
Customer summaries at a rapid pace,
Orders counted, revenue summed so neat,
Lifetime values making our data complete! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title references a new dbt model for order summary matching the main changeset, though the '[E2E Test]' prefix is somewhat unclear in context.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dp/newmodel

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
dbt/models/gold/gold_order_summary.sql (4)

19-19: Prefer explicit group by o.customer_id over positional group by 1

Positional references are fragile — inserting a column before customer_id silently changes grouping semantics without a compile error.

♻️ Proposed fix
-    group by 1
+    group by o.customer_id
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/gold/gold_order_summary.sql` at line 19, Replace the positional
GROUP BY with an explicit column reference: change the current "group by 1" to
"group by o.customer_id" so the aggregation in the gold_order_summary query (the
grouping for alias o.customer_id) remains correct and robust to column
reordering.

21-21: Avoid select * in a gold-layer model

select * from final will silently include any new columns added to upstream CTEs, making the model's public schema non-deterministic and harder to document/test. Explicit column enumeration is standard practice for gold-layer tables.

♻️ Proposed fix
-select * from final
+select
+    customer_id,
+    total_orders,
+    total_revenue,
+    avg_order_value,
+    first_order_date,
+    last_order_date,
+    customer_lifetime_days
+from final
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/gold/gold_order_summary.sql` at line 21, The model currently uses
a wildcard "select * from final" which makes the gold table schema
non-deterministic; replace the wildcard with an explicit, ordered column list
from the final CTE (e.g., list each column name produced by the final CTE in the
SELECT, preserving any aliases and data types), ensure you include only the
intended columns (created_at, order_id, customer_id, total_amount, etc. as
applicable) and update any downstream tests/docs to match the explicit schema
defined in gold_order_summary.sql.

1-21: Missing {{ config() }} block — materialization and tags not enforced in the model

The PR description specifies materialization: table and tags orders, revenue, gold, but there is no {{ config() }} block in the file. Without it the model inherits project-level defaults, which may not be table, and the tags won't be attached for use in dbt run --select tag:gold.

♻️ Proposed addition
+{{ config(
+    materialized='table',
+    tags=['orders', 'revenue', 'gold']
+) }}
+
 -- Gold table: Order summary metrics by customer segment
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/gold/gold_order_summary.sql` around lines 1 - 21, This model is
missing a dbt config block so it won't enforce materialization and tags; add a
top-of-file config for the gold_order_summary model (using the same file's
context) to set materialized='table' and tags=['orders','revenue','gold'] so the
model uses table materialization and is selectable by tag:gold; place the config
before the CTEs (before the with orders as (...) block) so it applies to the
final select.

16-16: Use {{ dbt.datediff() }} instead of raw datediff() for cross-database portability

datediff argument order differs between warehouses — BigQuery uses DATETIME_DIFF(end, start, granularity) while other databases use DATEDIFF(datepart, start, end). This model will fail on BigQuery with the raw datediff() syntax.

dbt v1.2+ provides the dbt.datediff() macro for cross-database compatibility. It accepts SQL expressions (including aggregates in grouped queries) and automatically translates to the correct warehouse syntax.

♻️ Proposed fix — use the `dbt.datediff` macro
-        datediff('day', min(o.order_date), max(o.order_date)) as customer_lifetime_days
+        {{ dbt.datediff("min(o.order_date)", "max(o.order_date)", "day") }} as customer_lifetime_days
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/gold/gold_order_summary.sql` at line 16, The expression
calculating customer_lifetime_days uses the raw datediff(...) call which is not
portable to BigQuery; replace datediff('day', min(o.order_date),
max(o.order_date)) with the dbt macro form dbt.datediff(min(o.order_date),
max(o.order_date), 'day') so the model gold_order_summary.sql (the
customer_lifetime_days calculation) uses {{ dbt.datediff(...) }} for
cross-database compatibility and correct argument ordering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dbt/models/gold/gold_order_summary.sql`:
- Around line 5-7: The customers CTE (customers and {{ ref('stg_customers') }})
is redundant and causes silent row-dropping because you INNER JOIN to it but
only select c.customer_id (which equals o.customer_id); either remove the
customers CTE and its join and aggregate directly from the orders source (e.g.,
use orders/order_summary CTEs) or, if you intend customer enrichment, keep
customers, include enrichment columns (e.g., first_name, last_name) in the
SELECT, and convert the INNER JOIN to a LEFT JOIN to avoid losing orders for
missing customers; also add a brief comment explaining the enrichment intent if
you retain the join.

---

Nitpick comments:
In `@dbt/models/gold/gold_order_summary.sql`:
- Line 19: Replace the positional GROUP BY with an explicit column reference:
change the current "group by 1" to "group by o.customer_id" so the aggregation
in the gold_order_summary query (the grouping for alias o.customer_id) remains
correct and robust to column reordering.
- Line 21: The model currently uses a wildcard "select * from final" which makes
the gold table schema non-deterministic; replace the wildcard with an explicit,
ordered column list from the final CTE (e.g., list each column name produced by
the final CTE in the SELECT, preserving any aliases and data types), ensure you
include only the intended columns (created_at, order_id, customer_id,
total_amount, etc. as applicable) and update any downstream tests/docs to match
the explicit schema defined in gold_order_summary.sql.
- Around line 1-21: This model is missing a dbt config block so it won't enforce
materialization and tags; add a top-of-file config for the gold_order_summary
model (using the same file's context) to set materialized='table' and
tags=['orders','revenue','gold'] so the model uses table materialization and is
selectable by tag:gold; place the config before the CTEs (before the with orders
as (...) block) so it applies to the final select.
- Line 16: The expression calculating customer_lifetime_days uses the raw
datediff(...) call which is not portable to BigQuery; replace datediff('day',
min(o.order_date), max(o.order_date)) with the dbt macro form
dbt.datediff(min(o.order_date), max(o.order_date), 'day') so the model
gold_order_summary.sql (the customer_lifetime_days calculation) uses {{
dbt.datediff(...) }} for cross-database compatibility and correct argument
ordering.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fca229 and e2bb912.

📒 Files selected for processing (1)
  • dbt/models/gold/gold_order_summary.sql

Comment on lines +5 to +7
customers as (
select * from {{ ref('stg_customers') }}
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's read the actual file content to verify the review claims
cat -n dbt/models/gold/gold_order_summary.sql

Repository: zingleai/dataPortal_demo

Length of output: 1736


🏁 Script executed:

#!/bin/bash
# Search for stg_customers definition to understand the data structure
find . -type f -name "*stg_customers*" | head -20

Repository: zingleai/dataPortal_demo

Length of output: 428


🏁 Script executed:

#!/bin/bash
# Look for schema tests on stg_customers to check for uniqueness constraints
fd "schema" --type f -g "*.yml" -g "*.yaml" | xargs rg -l "stg_customers" 2>/dev/null | head -10

Repository: zingleai/dataPortal_demo

Length of output: 574


🏁 Script executed:

#!/bin/bash
# Read the actual gold_order_summary.sql file
cat -n dbt/models/gold/gold_order_summary.sql

Repository: zingleai/dataPortal_demo

Length of output: 865


🏁 Script executed:

#!/bin/bash
# Read the stg_customers model
cat -n dbt/models/staging/stg_customers.sql

Repository: zingleai/dataPortal_demo

Length of output: 547


🏁 Script executed:

#!/bin/bash
# Read the schema.yml which should define the unique test
cat -n dbt/models/staging/schema.yml

Repository: zingleai/dataPortal_demo

Length of output: 13065


Remove unnecessary customers CTE join or clarify its purpose

The customers CTE is joined but only c.customer_id is selected in the output, which is identical to o.customer_id under the join predicate. The columns available from stg_customers (first_name, last_name) are not included. This creates two issues:

  1. Silent data loss: The INNER JOIN silently drops any order rows whose customer_id doesn't exist in stg_customers. If there are orphaned orders (e.g., due to data quality issues), total_revenue and total_orders will be understated without warning.

  2. Unnecessary complexity: If customer enrichment columns are not intended, the join adds complexity without benefit. You can aggregate directly from orders:

Proposed fix — remove redundant join
-customers as (
-    select * from {{ ref('stg_customers') }}
-),
 final as (
     select
-        c.customer_id,
+        o.customer_id,
         count(distinct o.order_id) as total_orders,
         sum(o.amount) as total_revenue,
         avg(o.amount) as avg_order_value,
         min(o.order_date) as first_order_date,
         max(o.order_date) as last_order_date,
         datediff('day', min(o.order_date), max(o.order_date)) as customer_lifetime_days
-    from orders o
-    join customers c on o.customer_id = c.customer_id
-    group by 1
+    from orders o
+    group by o.customer_id
 )

If customer enrichment columns (first_name, last_name) are intended, add them to the output and document the enrichment intent in a comment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/gold/gold_order_summary.sql` around lines 5 - 7, The customers CTE
(customers and {{ ref('stg_customers') }}) is redundant and causes silent
row-dropping because you INNER JOIN to it but only select c.customer_id (which
equals o.customer_id); either remove the customers CTE and its join and
aggregate directly from the orders source (e.g., use orders/order_summary CTEs)
or, if you intend customer enrichment, keep customers, include enrichment
columns (e.g., first_name, last_name) in the SELECT, and convert the INNER JOIN
to a LEFT JOIN to avoid losing orders for missing customers; also add a brief
comment explaining the enrichment intent if you retain the join.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant