Skip to content

Add gold table: marketing_excellent_cohort_revenue for Excellent-tier products#10

Open
zingleai wants to merge 2 commits intomainfrom
dp/f499c45a
Open

Add gold table: marketing_excellent_cohort_revenue for Excellent-tier products#10
zingleai wants to merge 2 commits intomainfrom
dp/f499c45a

Conversation

@zingleai
Copy link
Copy Markdown
Owner

@zingleai zingleai commented Feb 23, 2026

Summary

This PR introduces a new gold-layer table, marketing_excellent_cohort_revenue, which provides a revenue analysis scoped exclusively to products in the Excellent rating tier (avg_rating >= 4.5). Each row represents a single product and includes core product attributes, profitability metrics, and two cohort-level aggregations: cohort_total_revenue (total revenue across all Excellent-tier products via a window function) and pct_of_cohort_revenue (each product's percentage share of that cohort total).

The model sources directly from dim_products, filtering to rating_tier = 'Excellent'. Results are ordered by product_total_revenue descending, making the output immediately useful for marketing and merchandising teams prioritizing high-performing products. The use of NULLIF in the percentage calculation guards against division-by-zero edge cases.

Seven tests have been added to ensure data integrity: tier-filter correctness, avg_rating threshold enforcement (>= 4.5), non-negative revenue values, positive cohort totals, valid percentage ranges (0–100%), and standard uniqueness/not-null checks on product_id. These tests collectively protect against filter regressions, bad source data, and window function miscalculations.

Business impact is focused on enabling marketing teams to identify and act on top-revenue products within the highest-quality tier, supporting campaigns, promotions, and inventory decisions aligned to customer satisfaction signals.

Tags: gold, marketing, revenue, cohort-analysis, product-analytics, rating-tier, window-function, table-materialization
Criticality: P2

Models (1)

Model Layer Materialization Columns Upstream
marketing_excellent_cohort_revenue gold table 12 dim_products

Lineage

graph LR
    dim_products[dim_products]:::gold
    marketing_excellent_cohort_revenue[marketing_excellent_cohort_revenue]
    dim_products --> marketing_excellent_cohort_revenue

    classDef staging fill:#e3f2fd,stroke:#1565c0,color:#0d47a1
    classDef intermediate fill:#fff3e0,stroke:#e65100,color:#bf360c
    classDef gold fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20
Loading

Data Quality Tests (7)

  • [marketing_excellent_cohort_revenue] Only Excellent rating tier products included: All rows must belong to the Excellent rating tier — any other tier value indicates a filter logic regression.
  • [marketing_excellent_cohort_revenue] avg_rating meets Excellent threshold (>= 4.5): Excellent-tier products must have an avg_rating of at least 4.5, consistent with the classification in dim_products.
  • [marketing_excellent_cohort_revenue] No negative product revenue: Revenue values should never be negative; a negative value would indicate bad source data from int_product_profitability.
  • [marketing_excellent_cohort_revenue] cohort_total_revenue is positive: The cohort-level total revenue window function must always be a positive number as long as Excellent products exist.
  • [marketing_excellent_cohort_revenue] pct_of_cohort_revenue is within valid range: Each product's revenue share of the Excellent cohort must fall between 0% and 100%.
  • [marketing_excellent_cohort_revenue] unique_marketing_excellent_cohort_revenue_product_id: unique test on product_id
  • [marketing_excellent_cohort_revenue] not_null_marketing_excellent_cohort_revenue_product_id: not_null test on product_id

Generated by Data Portal

Summary by CodeRabbit

  • New Features

    • Expanded product analytics: pricing, costs, margins, ratings, review counts, revenue and cohort metrics; new cohort report highlighting revenue contribution for "Excellent" products.
  • Documentation

    • Enriched product schema with data types, descriptions and a suite of validation/data-quality checks.
  • Behavioral Changes

    • Refined rating-tier cutoff for the top tier, affecting products classified as "Excellent."

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

Expands dim_products with many product metrics and tests, adjusts the Excellent rating threshold, and adds a new marketing_excellent_cohort_revenue model plus its schema and data-quality tests for cohort revenue analysis.

Changes

Cohort / File(s) Summary
Core schema additions
dbt/models/marts/core/schema.yml
Adds many new columns to dim_products (product_name, product_category, unit_price, supply_cost, unit_margin, margin_pct, review_count, avg_rating, positive_reviews, negative_reviews, rating_tier with not_null and accepted_values, total_units_sold, product_total_revenue, product_gross_profit, created_at). Sets product_id data_type to varchar and adds a data_portal_tests suite for business-logic and data-quality checks.
Dim model logic tweak
dbt/models/marts/core/dim_products.sql
Changes rating_tier bucket boundary: Excellent threshold raised from avg_rating >= 4.5 to avg_rating >= 4.8.
New marketing model + schema
dbt/models/marts/marketing/marketing_excellent_cohort_revenue.sql, dbt/models/marts/marketing/schema.yml
Introduces marketing_excellent_cohort_revenue model selecting dim_products where rating_tier = 'Excellent', computing cohort_total_revenue and pct_of_cohort_revenue. Adds full schema block, column definitions, materialization config, meta, and multiple data_portal_tests validating ratings, revenue, and percentage ranges.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇🌿 I nibble code and count each row,
New columns bloom where numbers grow.
Excellent sellers hop to light,
Cohort revenue stitched up tight.
A rabbit cheers — metrics take flight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: adding a new gold-layer table (marketing_excellent_cohort_revenue) focused on Excellent-tier products, which aligns with the PR objectives and the four files modified.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dp/f499c45a

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
dbt/models/marts/marketing/schema.yml (1)

228-237: Harden cohort tests to fail on NULL metrics.

cohort_total_revenue and pct_of_cohort_revenue being NULL would currently pass these checks. Adding NULL guards makes the tests align with “must be positive / within 0–100.”

✅ Suggested test tightening
-          sql: "SELECT * FROM {{ model }} WHERE cohort_total_revenue <= 0"
+          sql: "SELECT * FROM {{ model }} WHERE cohort_total_revenue IS NULL OR cohort_total_revenue <= 0"
...
-          sql: "SELECT * FROM {{ model }} WHERE pct_of_cohort_revenue < 0 OR pct_of_cohort_revenue > 100"
+          sql: "SELECT * FROM {{ model }} WHERE pct_of_cohort_revenue IS NULL OR pct_of_cohort_revenue < 0 OR pct_of_cohort_revenue > 100"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/marketing/schema.yml` around lines 228 - 237, Update the
failing SQL tests so NULL values are treated as failures: in the
"cohort_total_revenue is positive" test (referring to cohort_total_revenue and
{{ model }}) add a condition to the WHERE that also selects rows where
cohort_total_revenue IS NULL (e.g., OR cohort_total_revenue IS NULL) alongside
cohort_total_revenue <= 0; similarly update the "pct_of_cohort_revenue is within
valid range" test to also select rows where pct_of_cohort_revenue IS NULL (e.g.,
OR pct_of_cohort_revenue IS NULL) in addition to the existing < 0 OR > 100
checks so NULL metrics cause the test to fail.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dbt/models/marts/core/schema.yml`:
- Around line 149-156: Update the description for the product_total_revenue
metric to reference the actual column name total_units_sold (instead of
units_sold) and adjust the formula wording accordingly (e.g., "unit_price ×
total_units_sold"); keep the rest of the contextual note about downstream model
aggregation and Excellent rating tier unchanged.

---

Nitpick comments:
In `@dbt/models/marts/marketing/schema.yml`:
- Around line 228-237: Update the failing SQL tests so NULL values are treated
as failures: in the "cohort_total_revenue is positive" test (referring to
cohort_total_revenue and {{ model }}) add a condition to the WHERE that also
selects rows where cohort_total_revenue IS NULL (e.g., OR cohort_total_revenue
IS NULL) alongside cohort_total_revenue <= 0; similarly update the
"pct_of_cohort_revenue is within valid range" test to also select rows where
pct_of_cohort_revenue IS NULL (e.g., OR pct_of_cohort_revenue IS NULL) in
addition to the existing < 0 OR > 100 checks so NULL metrics cause the test to
fail.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fca229 and d5811a3.

📒 Files selected for processing (3)
  • dbt/models/marts/core/schema.yml
  • dbt/models/marts/marketing/marketing_excellent_cohort_revenue.sql
  • dbt/models/marts/marketing/schema.yml

Comment on lines +149 to +156
- name: product_total_revenue
description: >
Total gross revenue generated by the product (unit_price × units_sold).
This is the primary revenue metric consumed by the downstream
marketing_excellent_cohort_revenue model, where it is aggregated via
a window function to produce cohort_total_revenue and
pct_of_cohort_revenue for all products classified in the Excellent
rating tier (avg_rating >= 4.5).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify revenue description to match column naming.

The description references “units_sold,” but the column is total_units_sold. This is a minor doc inconsistency that can confuse consumers.

📝 Suggested doc fix
-          Total gross revenue generated by the product (unit_price × units_sold).
+          Total gross revenue generated by the product (unit_price × total_units_sold).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: product_total_revenue
description: >
Total gross revenue generated by the product (unit_price × units_sold).
This is the primary revenue metric consumed by the downstream
marketing_excellent_cohort_revenue model, where it is aggregated via
a window function to produce cohort_total_revenue and
pct_of_cohort_revenue for all products classified in the Excellent
rating tier (avg_rating >= 4.5).
- name: product_total_revenue
description: >
Total gross revenue generated by the product (unit_price × total_units_sold).
This is the primary revenue metric consumed by the downstream
marketing_excellent_cohort_revenue model, where it is aggregated via
a window function to produce cohort_total_revenue and
pct_of_cohort_revenue for all products classified in the Excellent
rating tier (avg_rating >= 4.5).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/schema.yml` around lines 149 - 156, Update the
description for the product_total_revenue metric to reference the actual column
name total_units_sold (instead of units_sold) and adjust the formula wording
accordingly (e.g., "unit_price × total_units_sold"); keep the rest of the
contextual note about downstream model aggregation and Excellent rating tier
unchanged.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dbt/models/marts/core/dim_products.sql (1)

27-32: ⚠️ Potential issue | 🟠 Major

Breaking change to public tier-1 model without notification or documentation.

The Excellent threshold has been raised from 4.54.8, which silently reclassifies products with avg_rating ∈ [4.5, 4.8) from Excellent to Good. This directly impacts marketing_excellent_cohort_revenue.sql, which filters where rating_tier = 'Excellent' — that model will now exclude products previously included, narrowing its cohort without any warning to consumers.

The problem is compounded by a documented contradiction:

  • marketing_excellent_cohort_revenue.sql header states: "Revenue analysis for products in the Excellent rating cohort (avg_rating >= 4.5)"
  • The actual implementation is now avg_rating >= 4.8

Required actions:

  1. Resolve the threshold mismatch: clarify whether the intent is >= 4.5 (update code) or >= 4.8 (update comment and downstream models).
  2. Notify the configured listeners (sofia.gutierrez, marcus.chen, jordan.blake) of this breaking change and its impact on marketing_excellent_cohort_revenue.
  3. Add a changelog entry or data-contract documentation describing the semantic break and cohort narrowing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/dim_products.sql` around lines 27 - 32, The rating
bucket threshold change in dim_products.sql (case on r.avg_rating producing
rating_tier) raised "Excellent" from 4.5 to 4.8 and breaks downstream consumers
(notably marketing_excellent_cohort_revenue.sql which claims avg_rating >= 4.5);
decide which behavior is intended and then either (A) revert the rating_tier
rule to when r.avg_rating >= 4.5 to preserve existing contract, or (B) keep >=
4.8 but update the marketing_excellent_cohort_revenue.sql header and its filters
to >= 4.8; in either case update data-contract docs/changelog to record the
change, and notify listeners (sofia.gutierrez, marcus.chen, jordan.blake) about
the decision and impact on marketing_excellent_cohort_revenue.
♻️ Duplicate comments (1)
dbt/models/marts/core/schema.yml (1)

179-187: units_sold in the description still mismatches the column name total_units_sold.

Line 181 still reads unit_price × units_sold; the correct column name is total_units_sold.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/schema.yml` around lines 179 - 187, The description for
the metric product_total_revenue incorrectly refers to the column units_sold;
update the text to reference the actual column name total_units_sold (e.g.,
change "unit_price × units_sold" to "unit_price × total_units_sold") in the
product_total_revenue description so it matches the schema and downstream
references.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dbt/models/marts/core/schema.yml`:
- Around line 107-111: The business_logic test "Positive plus negative reviews
do not exceed total review count" is NULL-unsafe because positive_reviews and
negative_reviews can be NULL from the LEFT JOIN; update the test SQL to use
COALESCE (or equivalent) so the arithmetic treats NULLs as 0 (e.g., replace
(positive_reviews + negative_reviews) with (COALESCE(positive_reviews,0) +
COALESCE(negative_reviews,0))) and consider applying COALESCE to review_count as
well to ensure the comparison correctly flags rows where the summed subcounts
exceed the total.

---

Outside diff comments:
In `@dbt/models/marts/core/dim_products.sql`:
- Around line 27-32: The rating bucket threshold change in dim_products.sql
(case on r.avg_rating producing rating_tier) raised "Excellent" from 4.5 to 4.8
and breaks downstream consumers (notably marketing_excellent_cohort_revenue.sql
which claims avg_rating >= 4.5); decide which behavior is intended and then
either (A) revert the rating_tier rule to when r.avg_rating >= 4.5 to preserve
existing contract, or (B) keep >= 4.8 but update the
marketing_excellent_cohort_revenue.sql header and its filters to >= 4.8; in
either case update data-contract docs/changelog to record the change, and notify
listeners (sofia.gutierrez, marcus.chen, jordan.blake) about the decision and
impact on marketing_excellent_cohort_revenue.

---

Duplicate comments:
In `@dbt/models/marts/core/schema.yml`:
- Around line 179-187: The description for the metric product_total_revenue
incorrectly refers to the column units_sold; update the text to reference the
actual column name total_units_sold (e.g., change "unit_price × units_sold" to
"unit_price × total_units_sold") in the product_total_revenue description so it
matches the schema and downstream references.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5811a3 and ad748b1.

📒 Files selected for processing (2)
  • dbt/models/marts/core/dim_products.sql
  • dbt/models/marts/core/schema.yml

Comment on lines +107 to +111
- name: "Positive plus negative reviews do not exceed total review count"
type: "business_logic"
sql: "SELECT * FROM {{ model }} WHERE (positive_reviews + negative_reviews) > review_count"
expected_result: "0 rows"
description: "The sum of positive and negative reviews must not exceed the total review_count, since the two sub-counts are a subset of all reviews."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

NULL-unsafe arithmetic in review-count consistency test.

dim_products.sql uses LEFT JOIN reviews, so positive_reviews and negative_reviews can be NULL. When either operand is NULL, (positive_reviews + negative_reviews) evaluates to NULL and NULL > review_count is unknown — the row is silently excluded, making the test a false pass for products with no review data.

🛡️ Proposed fix
-          sql: "SELECT * FROM {{ model }} WHERE (positive_reviews + negative_reviews) > review_count"
+          sql: "SELECT * FROM {{ model }} WHERE (COALESCE(positive_reviews, 0) + COALESCE(negative_reviews, 0)) > COALESCE(review_count, 0)"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: "Positive plus negative reviews do not exceed total review count"
type: "business_logic"
sql: "SELECT * FROM {{ model }} WHERE (positive_reviews + negative_reviews) > review_count"
expected_result: "0 rows"
description: "The sum of positive and negative reviews must not exceed the total review_count, since the two sub-counts are a subset of all reviews."
- name: "Positive plus negative reviews do not exceed total review count"
type: "business_logic"
sql: "SELECT * FROM {{ model }} WHERE (COALESCE(positive_reviews, 0) + COALESCE(negative_reviews, 0)) > COALESCE(review_count, 0)"
expected_result: "0 rows"
description: "The sum of positive and negative reviews must not exceed the total review_count, since the two sub-counts are a subset of all reviews."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dbt/models/marts/core/schema.yml` around lines 107 - 111, The business_logic
test "Positive plus negative reviews do not exceed total review count" is
NULL-unsafe because positive_reviews and negative_reviews can be NULL from the
LEFT JOIN; update the test SQL to use COALESCE (or equivalent) so the arithmetic
treats NULLs as 0 (e.g., replace (positive_reviews + negative_reviews) with
(COALESCE(positive_reviews,0) + COALESCE(negative_reviews,0))) and consider
applying COALESCE to review_count as well to ensure the comparison correctly
flags rows where the summed subcounts exceed the total.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant