[E2E Test] New model: gold_order_summary by zingleai · Pull Request #8 · zingleai/dataPortal_demo

zingleai · 2026-02-23T15:01:24Z

Summary

Order summary metrics by customer — total orders, revenue, avg order value, lifetime days.

Tags: orders, revenue, gold
Criticality: P1

Models (1)

Model	Layer	Materialization	Columns	Upstream
`gold_order_summary`	gold	table	7	stg_orders, stg_customers

Lineage

graph LR; stg_orders --> gold_order_summary; stg_customers --> gold_order_summary

Data Quality Tests (2)

[gold_order_summary] not_null_customer_id: Ensure customer_id is never null
[gold_order_summary] positive_revenue: Revenue must be non-negative

Generated by Data Portal

Summary by CodeRabbit

New Features
- Added customer order summary analytics providing insights into total orders, total revenue, average order value, and customer lifetime metrics per customer account.

coderabbitai · 2026-02-23T15:01:45Z

📝 Walkthrough

Walkthrough

A new dbt gold model is introduced that aggregates order data by customer, computing metrics including total orders, total revenue, average order value, order date ranges, and customer lifetime days from staging tables.

Changes

Cohort / File(s)	Summary
New Gold Model `dbt/models/gold/gold_order_summary.sql`	Introduces order summary aggregations per customer, computing distinct order counts, revenue totals, averages, and date-based metrics from stg_orders and stg_customers staging sources.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A golden table now takes its place,
Customer summaries at a rapid pace,
Orders counted, revenue summed so neat,
Lifetime values making our data complete! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title references a new dbt model for order summary matching the main changeset, though the '[E2E Test]' prefix is somewhat unclear in context.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dp/newmodel

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

dbt/models/gold/gold_order_summary.sql (4)
19-19: Prefer explicit group by o.customer_id over positional group by 1

Positional references are fragile — inserting a column before customer_id silently changes grouping semantics without a compile error.
♻️ Proposed fix
-    group by 1
+    group by o.customer_id
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/gold/gold_order_summary.sql` at line 19, Replace the positional
GROUP BY with an explicit column reference: change the current "group by 1" to
"group by o.customer_id" so the aggregation in the gold_order_summary query (the
grouping for alias o.customer_id) remains correct and robust to column
reordering.
21-21: Avoid select * in a gold-layer model

select * from final will silently include any new columns added to upstream CTEs, making the model's public schema non-deterministic and harder to document/test. Explicit column enumeration is standard practice for gold-layer tables.
♻️ Proposed fix
-select * from final
+select
+    customer_id,
+    total_orders,
+    total_revenue,
+    avg_order_value,
+    first_order_date,
+    last_order_date,
+    customer_lifetime_days
+from final
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/gold/gold_order_summary.sql` at line 21, The model currently uses
a wildcard "select * from final" which makes the gold table schema
non-deterministic; replace the wildcard with an explicit, ordered column list
from the final CTE (e.g., list each column name produced by the final CTE in the
SELECT, preserving any aliases and data types), ensure you include only the
intended columns (created_at, order_id, customer_id, total_amount, etc. as
applicable) and update any downstream tests/docs to match the explicit schema
defined in gold_order_summary.sql.
1-21: Missing {{ config() }} block — materialization and tags not enforced in the model

The PR description specifies materialization: table and tags orders, revenue, gold, but there is no {{ config() }} block in the file. Without it the model inherits project-level defaults, which may not be table, and the tags won't be attached for use in dbt run --select tag:gold.
♻️ Proposed addition
+{{ config(
+    materialized='table',
+    tags=['orders', 'revenue', 'gold']
+) }}
+
 -- Gold table: Order summary metrics by customer segment
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/gold/gold_order_summary.sql` around lines 1 - 21, This model is
missing a dbt config block so it won't enforce materialization and tags; add a
top-of-file config for the gold_order_summary model (using the same file's
context) to set materialized='table' and tags=['orders','revenue','gold'] so the
model uses table materialization and is selectable by tag:gold; place the config
before the CTEs (before the with orders as (...) block) so it applies to the
final select.
16-16: Use {{ dbt.datediff() }} instead of raw datediff() for cross-database portability

datediff argument order differs between warehouses — BigQuery uses DATETIME_DIFF(end, start, granularity) while other databases use DATEDIFF(datepart, start, end). This model will fail on BigQuery with the raw datediff() syntax.

dbt v1.2+ provides the dbt.datediff() macro for cross-database compatibility. It accepts SQL expressions (including aggregates in grouped queries) and automatically translates to the correct warehouse syntax.
♻️ Proposed fix — use the `dbt.datediff` macro
-        datediff('day', min(o.order_date), max(o.order_date)) as customer_lifetime_days
+        {{ dbt.datediff("min(o.order_date)", "max(o.order_date)", "day") }} as customer_lifetime_days
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/gold/gold_order_summary.sql` at line 16, The expression
calculating customer_lifetime_days uses the raw datediff(...) call which is not
portable to BigQuery; replace datediff('day', min(o.order_date),
max(o.order_date)) with the dbt macro form dbt.datediff(min(o.order_date),
max(o.order_date), 'day') so the model gold_order_summary.sql (the
customer_lifetime_days calculation) uses {{ dbt.datediff(...) }} for
cross-database compatibility and correct argument ordering.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dbt/models/gold/gold_order_summary.sql`:
- Around line 5-7: The customers CTE (customers and {{ ref('stg_customers') }})
is redundant and causes silent row-dropping because you INNER JOIN to it but
only select c.customer_id (which equals o.customer_id); either remove the
customers CTE and its join and aggregate directly from the orders source (e.g.,
use orders/order_summary CTEs) or, if you intend customer enrichment, keep
customers, include enrichment columns (e.g., first_name, last_name) in the
SELECT, and convert the INNER JOIN to a LEFT JOIN to avoid losing orders for
missing customers; also add a brief comment explaining the enrichment intent if
you retain the join.

---

Nitpick comments:
In `@dbt/models/gold/gold_order_summary.sql`:
- Line 19: Replace the positional GROUP BY with an explicit column reference:
change the current "group by 1" to "group by o.customer_id" so the aggregation
in the gold_order_summary query (the grouping for alias o.customer_id) remains
correct and robust to column reordering.
- Line 21: The model currently uses a wildcard "select * from final" which makes
the gold table schema non-deterministic; replace the wildcard with an explicit,
ordered column list from the final CTE (e.g., list each column name produced by
the final CTE in the SELECT, preserving any aliases and data types), ensure you
include only the intended columns (created_at, order_id, customer_id,
total_amount, etc. as applicable) and update any downstream tests/docs to match
the explicit schema defined in gold_order_summary.sql.
- Around line 1-21: This model is missing a dbt config block so it won't enforce
materialization and tags; add a top-of-file config for the gold_order_summary
model (using the same file's context) to set materialized='table' and
tags=['orders','revenue','gold'] so the model uses table materialization and is
selectable by tag:gold; place the config before the CTEs (before the with orders
as (...) block) so it applies to the final select.
- Line 16: The expression calculating customer_lifetime_days uses the raw
datediff(...) call which is not portable to BigQuery; replace datediff('day',
min(o.order_date), max(o.order_date)) with the dbt macro form
dbt.datediff(min(o.order_date), max(o.order_date), 'day') so the model
gold_order_summary.sql (the customer_lifetime_days calculation) uses {{
dbt.datediff(...) }} for cross-database compatibility and correct argument
ordering.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fca229 and e2bb912.

📒 Files selected for processing (1)

dbt/models/gold/gold_order_summary.sql

coderabbitai · 2026-02-23T15:08:48Z

dbt/models/gold/gold_order_summary.sql

+customers as (
+    select * from {{ ref('stg_customers') }}
+),


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, let's read the actual file content to verify the review claims cat -n dbt/models/gold/gold_order_summary.sql

Repository: zingleai/dataPortal_demo

Length of output: 1736

🏁 Script executed:

#!/bin/bash # Search for stg_customers definition to understand the data structure find . -type f -name "*stg_customers*" | head -20

Repository: zingleai/dataPortal_demo

Length of output: 428

🏁 Script executed:

#!/bin/bash # Look for schema tests on stg_customers to check for uniqueness constraints fd "schema" --type f -g "*.yml" -g "*.yaml" | xargs rg -l "stg_customers" 2>/dev/null | head -10

Repository: zingleai/dataPortal_demo

Length of output: 574

🏁 Script executed:

#!/bin/bash # Read the actual gold_order_summary.sql file cat -n dbt/models/gold/gold_order_summary.sql

Repository: zingleai/dataPortal_demo

Length of output: 865

🏁 Script executed:

#!/bin/bash # Read the stg_customers model cat -n dbt/models/staging/stg_customers.sql

Repository: zingleai/dataPortal_demo

Length of output: 547

🏁 Script executed:

#!/bin/bash # Read the schema.yml which should define the unique test cat -n dbt/models/staging/schema.yml

Repository: zingleai/dataPortal_demo

Length of output: 13065

Remove unnecessary customers CTE join or clarify its purpose

The customers CTE is joined but only c.customer_id is selected in the output, which is identical to o.customer_id under the join predicate. The columns available from stg_customers (first_name, last_name) are not included. This creates two issues:

Silent data loss: The INNER JOIN silently drops any order rows whose customer_id doesn't exist in stg_customers. If there are orphaned orders (e.g., due to data quality issues), total_revenue and total_orders will be understated without warning.

Unnecessary complexity: If customer enrichment columns are not intended, the join adds complexity without benefit. You can aggregate directly from orders:

Proposed fix — remove redundant join

-customers as ( - select * from {{ ref('stg_customers') }} -), final as ( select - c.customer_id, + o.customer_id, count(distinct o.order_id) as total_orders, sum(o.amount) as total_revenue, avg(o.amount) as avg_order_value, min(o.order_date) as first_order_date, max(o.order_date) as last_order_date, datediff('day', min(o.order_date), max(o.order_date)) as customer_lifetime_days - from orders o - join customers c on o.customer_id = c.customer_id - group by 1 + from orders o + group by o.customer_id )

If customer enrichment columns (first_name, last_name) are intended, add them to the output and document the enrichment intent in a comment.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dbt/models/gold/gold_order_summary.sql` around lines 5 - 7, The customers CTE (customers and {{ ref('stg_customers') }}) is redundant and causes silent row-dropping because you INNER JOIN to it but only select c.customer_id (which equals o.customer_id); either remove the customers CTE and its join and aggregate directly from the orders source (e.g., use orders/order_summary CTEs) or, if you intend customer enrichment, keep customers, include enrichment columns (e.g., first_name, last_name) in the SELECT, and convert the INNER JOIN to a LEFT JOIN to avoid losing orders for missing customers; also add a brief comment explaining the enrichment intent if you retain the join.

dataportal: [E2E Test] New model: gold_order_summary

e2bb912

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[E2E Test] New model: gold_order_summary#8

[E2E Test] New model: gold_order_summary#8
zingleai wants to merge 1 commit intomainfrom
dp/newmodel

zingleai commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zingleai commented Feb 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Models (1)

Lineage

Data Quality Tests (2)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zingleai commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading