Add gold table: marketing_excellent_cohort_revenue for Excellent-tier products#10
Add gold table: marketing_excellent_cohort_revenue for Excellent-tier products#10
Conversation
📝 WalkthroughWalkthroughExpands Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
dbt/models/marts/marketing/schema.yml (1)
228-237: Harden cohort tests to fail on NULL metrics.
cohort_total_revenueandpct_of_cohort_revenuebeing NULL would currently pass these checks. Adding NULL guards makes the tests align with “must be positive / within 0–100.”✅ Suggested test tightening
- sql: "SELECT * FROM {{ model }} WHERE cohort_total_revenue <= 0" + sql: "SELECT * FROM {{ model }} WHERE cohort_total_revenue IS NULL OR cohort_total_revenue <= 0" ... - sql: "SELECT * FROM {{ model }} WHERE pct_of_cohort_revenue < 0 OR pct_of_cohort_revenue > 100" + sql: "SELECT * FROM {{ model }} WHERE pct_of_cohort_revenue IS NULL OR pct_of_cohort_revenue < 0 OR pct_of_cohort_revenue > 100"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbt/models/marts/marketing/schema.yml` around lines 228 - 237, Update the failing SQL tests so NULL values are treated as failures: in the "cohort_total_revenue is positive" test (referring to cohort_total_revenue and {{ model }}) add a condition to the WHERE that also selects rows where cohort_total_revenue IS NULL (e.g., OR cohort_total_revenue IS NULL) alongside cohort_total_revenue <= 0; similarly update the "pct_of_cohort_revenue is within valid range" test to also select rows where pct_of_cohort_revenue IS NULL (e.g., OR pct_of_cohort_revenue IS NULL) in addition to the existing < 0 OR > 100 checks so NULL metrics cause the test to fail.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dbt/models/marts/core/schema.yml`:
- Around line 149-156: Update the description for the product_total_revenue
metric to reference the actual column name total_units_sold (instead of
units_sold) and adjust the formula wording accordingly (e.g., "unit_price ×
total_units_sold"); keep the rest of the contextual note about downstream model
aggregation and Excellent rating tier unchanged.
---
Nitpick comments:
In `@dbt/models/marts/marketing/schema.yml`:
- Around line 228-237: Update the failing SQL tests so NULL values are treated
as failures: in the "cohort_total_revenue is positive" test (referring to
cohort_total_revenue and {{ model }}) add a condition to the WHERE that also
selects rows where cohort_total_revenue IS NULL (e.g., OR cohort_total_revenue
IS NULL) alongside cohort_total_revenue <= 0; similarly update the
"pct_of_cohort_revenue is within valid range" test to also select rows where
pct_of_cohort_revenue IS NULL (e.g., OR pct_of_cohort_revenue IS NULL) in
addition to the existing < 0 OR > 100 checks so NULL metrics cause the test to
fail.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
dbt/models/marts/core/schema.ymldbt/models/marts/marketing/marketing_excellent_cohort_revenue.sqldbt/models/marts/marketing/schema.yml
dbt/models/marts/core/schema.yml
Outdated
| - name: product_total_revenue | ||
| description: > | ||
| Total gross revenue generated by the product (unit_price × units_sold). | ||
| This is the primary revenue metric consumed by the downstream | ||
| marketing_excellent_cohort_revenue model, where it is aggregated via | ||
| a window function to produce cohort_total_revenue and | ||
| pct_of_cohort_revenue for all products classified in the Excellent | ||
| rating tier (avg_rating >= 4.5). |
There was a problem hiding this comment.
Clarify revenue description to match column naming.
The description references “units_sold,” but the column is total_units_sold. This is a minor doc inconsistency that can confuse consumers.
📝 Suggested doc fix
- Total gross revenue generated by the product (unit_price × units_sold).
+ Total gross revenue generated by the product (unit_price × total_units_sold).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: product_total_revenue | |
| description: > | |
| Total gross revenue generated by the product (unit_price × units_sold). | |
| This is the primary revenue metric consumed by the downstream | |
| marketing_excellent_cohort_revenue model, where it is aggregated via | |
| a window function to produce cohort_total_revenue and | |
| pct_of_cohort_revenue for all products classified in the Excellent | |
| rating tier (avg_rating >= 4.5). | |
| - name: product_total_revenue | |
| description: > | |
| Total gross revenue generated by the product (unit_price × total_units_sold). | |
| This is the primary revenue metric consumed by the downstream | |
| marketing_excellent_cohort_revenue model, where it is aggregated via | |
| a window function to produce cohort_total_revenue and | |
| pct_of_cohort_revenue for all products classified in the Excellent | |
| rating tier (avg_rating >= 4.5). |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbt/models/marts/core/schema.yml` around lines 149 - 156, Update the
description for the product_total_revenue metric to reference the actual column
name total_units_sold (instead of units_sold) and adjust the formula wording
accordingly (e.g., "unit_price × total_units_sold"); keep the rest of the
contextual note about downstream model aggregation and Excellent rating tier
unchanged.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dbt/models/marts/core/dim_products.sql (1)
27-32:⚠️ Potential issue | 🟠 MajorBreaking change to public tier-1 model without notification or documentation.
The
Excellentthreshold has been raised from4.5→4.8, which silently reclassifies products withavg_rating ∈ [4.5, 4.8)fromExcellenttoGood. This directly impactsmarketing_excellent_cohort_revenue.sql, which filterswhere rating_tier = 'Excellent'— that model will now exclude products previously included, narrowing its cohort without any warning to consumers.The problem is compounded by a documented contradiction:
marketing_excellent_cohort_revenue.sqlheader states: "Revenue analysis for products in the Excellent rating cohort (avg_rating >= 4.5)"- The actual implementation is now
avg_rating >= 4.8Required actions:
- Resolve the threshold mismatch: clarify whether the intent is
>= 4.5(update code) or>= 4.8(update comment and downstream models).- Notify the configured listeners (sofia.gutierrez, marcus.chen, jordan.blake) of this breaking change and its impact on
marketing_excellent_cohort_revenue.- Add a changelog entry or data-contract documentation describing the semantic break and cohort narrowing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbt/models/marts/core/dim_products.sql` around lines 27 - 32, The rating bucket threshold change in dim_products.sql (case on r.avg_rating producing rating_tier) raised "Excellent" from 4.5 to 4.8 and breaks downstream consumers (notably marketing_excellent_cohort_revenue.sql which claims avg_rating >= 4.5); decide which behavior is intended and then either (A) revert the rating_tier rule to when r.avg_rating >= 4.5 to preserve existing contract, or (B) keep >= 4.8 but update the marketing_excellent_cohort_revenue.sql header and its filters to >= 4.8; in either case update data-contract docs/changelog to record the change, and notify listeners (sofia.gutierrez, marcus.chen, jordan.blake) about the decision and impact on marketing_excellent_cohort_revenue.
♻️ Duplicate comments (1)
dbt/models/marts/core/schema.yml (1)
179-187:units_soldin the description still mismatches the column nametotal_units_sold.Line 181 still reads
unit_price × units_sold; the correct column name istotal_units_sold.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbt/models/marts/core/schema.yml` around lines 179 - 187, The description for the metric product_total_revenue incorrectly refers to the column units_sold; update the text to reference the actual column name total_units_sold (e.g., change "unit_price × units_sold" to "unit_price × total_units_sold") in the product_total_revenue description so it matches the schema and downstream references.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dbt/models/marts/core/schema.yml`:
- Around line 107-111: The business_logic test "Positive plus negative reviews
do not exceed total review count" is NULL-unsafe because positive_reviews and
negative_reviews can be NULL from the LEFT JOIN; update the test SQL to use
COALESCE (or equivalent) so the arithmetic treats NULLs as 0 (e.g., replace
(positive_reviews + negative_reviews) with (COALESCE(positive_reviews,0) +
COALESCE(negative_reviews,0))) and consider applying COALESCE to review_count as
well to ensure the comparison correctly flags rows where the summed subcounts
exceed the total.
---
Outside diff comments:
In `@dbt/models/marts/core/dim_products.sql`:
- Around line 27-32: The rating bucket threshold change in dim_products.sql
(case on r.avg_rating producing rating_tier) raised "Excellent" from 4.5 to 4.8
and breaks downstream consumers (notably marketing_excellent_cohort_revenue.sql
which claims avg_rating >= 4.5); decide which behavior is intended and then
either (A) revert the rating_tier rule to when r.avg_rating >= 4.5 to preserve
existing contract, or (B) keep >= 4.8 but update the
marketing_excellent_cohort_revenue.sql header and its filters to >= 4.8; in
either case update data-contract docs/changelog to record the change, and notify
listeners (sofia.gutierrez, marcus.chen, jordan.blake) about the decision and
impact on marketing_excellent_cohort_revenue.
---
Duplicate comments:
In `@dbt/models/marts/core/schema.yml`:
- Around line 179-187: The description for the metric product_total_revenue
incorrectly refers to the column units_sold; update the text to reference the
actual column name total_units_sold (e.g., change "unit_price × units_sold" to
"unit_price × total_units_sold") in the product_total_revenue description so it
matches the schema and downstream references.
| - name: "Positive plus negative reviews do not exceed total review count" | ||
| type: "business_logic" | ||
| sql: "SELECT * FROM {{ model }} WHERE (positive_reviews + negative_reviews) > review_count" | ||
| expected_result: "0 rows" | ||
| description: "The sum of positive and negative reviews must not exceed the total review_count, since the two sub-counts are a subset of all reviews." |
There was a problem hiding this comment.
NULL-unsafe arithmetic in review-count consistency test.
dim_products.sql uses LEFT JOIN reviews, so positive_reviews and negative_reviews can be NULL. When either operand is NULL, (positive_reviews + negative_reviews) evaluates to NULL and NULL > review_count is unknown — the row is silently excluded, making the test a false pass for products with no review data.
🛡️ Proposed fix
- sql: "SELECT * FROM {{ model }} WHERE (positive_reviews + negative_reviews) > review_count"
+ sql: "SELECT * FROM {{ model }} WHERE (COALESCE(positive_reviews, 0) + COALESCE(negative_reviews, 0)) > COALESCE(review_count, 0)"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: "Positive plus negative reviews do not exceed total review count" | |
| type: "business_logic" | |
| sql: "SELECT * FROM {{ model }} WHERE (positive_reviews + negative_reviews) > review_count" | |
| expected_result: "0 rows" | |
| description: "The sum of positive and negative reviews must not exceed the total review_count, since the two sub-counts are a subset of all reviews." | |
| - name: "Positive plus negative reviews do not exceed total review count" | |
| type: "business_logic" | |
| sql: "SELECT * FROM {{ model }} WHERE (COALESCE(positive_reviews, 0) + COALESCE(negative_reviews, 0)) > COALESCE(review_count, 0)" | |
| expected_result: "0 rows" | |
| description: "The sum of positive and negative reviews must not exceed the total review_count, since the two sub-counts are a subset of all reviews." |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbt/models/marts/core/schema.yml` around lines 107 - 111, The business_logic
test "Positive plus negative reviews do not exceed total review count" is
NULL-unsafe because positive_reviews and negative_reviews can be NULL from the
LEFT JOIN; update the test SQL to use COALESCE (or equivalent) so the arithmetic
treats NULLs as 0 (e.g., replace (positive_reviews + negative_reviews) with
(COALESCE(positive_reviews,0) + COALESCE(negative_reviews,0))) and consider
applying COALESCE to review_count as well to ensure the comparison correctly
flags rows where the summed subcounts exceed the total.
Summary
This PR introduces a new gold-layer table,
marketing_excellent_cohort_revenue, which provides a revenue analysis scoped exclusively to products in the Excellent rating tier (avg_rating >= 4.5). Each row represents a single product and includes core product attributes, profitability metrics, and two cohort-level aggregations:cohort_total_revenue(total revenue across all Excellent-tier products via a window function) andpct_of_cohort_revenue(each product's percentage share of that cohort total).The model sources directly from
dim_products, filtering torating_tier = 'Excellent'. Results are ordered byproduct_total_revenuedescending, making the output immediately useful for marketing and merchandising teams prioritizing high-performing products. The use ofNULLIFin the percentage calculation guards against division-by-zero edge cases.Seven tests have been added to ensure data integrity: tier-filter correctness, avg_rating threshold enforcement (>= 4.5), non-negative revenue values, positive cohort totals, valid percentage ranges (0–100%), and standard uniqueness/not-null checks on
product_id. These tests collectively protect against filter regressions, bad source data, and window function miscalculations.Business impact is focused on enabling marketing teams to identify and act on top-revenue products within the highest-quality tier, supporting campaigns, promotions, and inventory decisions aligned to customer satisfaction signals.
Tags:
gold,marketing,revenue,cohort-analysis,product-analytics,rating-tier,window-function,table-materializationCriticality:
P2Models (1)
marketing_excellent_cohort_revenueLineage
graph LR dim_products[dim_products]:::gold marketing_excellent_cohort_revenue[marketing_excellent_cohort_revenue] dim_products --> marketing_excellent_cohort_revenue classDef staging fill:#e3f2fd,stroke:#1565c0,color:#0d47a1 classDef intermediate fill:#fff3e0,stroke:#e65100,color:#bf360c classDef gold fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20Data Quality Tests (7)
Generated by Data Portal
Summary by CodeRabbit
New Features
Documentation
Behavioral Changes