This project demonstrates senior-level SQL analytics, modular query architecture, and executive-ready business storytelling using PostgreSQL 16.
This project simulates a real-world retail business analytics engagement using the Superstore transactional dataset.
The objective is to transform raw order-level data into executive-ready business insights through structured data modeling, defensive SQL practices, and advanced analytical techniques.
All analysis was performed using PostgreSQL 16 and pgAdmin 4.
- Source: Superstore transactional dataset
- ~9,994 raw staging records
- 1,862 cleaned order-line records after transformation
- Fact grain: Order Line Level
- Evaluate revenue growth and profitability trends (MoM, YoY, YTD)
- Identify margin drivers and loss-leading categories
- Analyze regional and shipping performance
- Determine revenue and profit concentration (Pareto 80/20)
- Segment customers using RFM methodology
- Generate executive-level strategic recommendations
- What are total revenue, profit, margin %, and order volume?
- Which categories and sub-categories drive profit vs. losses?
- How is revenue trending month-over-month and year-over-year?
- Are profits concentrated in a small percentage of customers or products?
- Which customers are high-value vs. at-risk?
During transformation, several real-world data integrity issues were identified and resolved:
- Detected inconsistent
product_id → product_namemappings across 32 product IDs - Implemented deterministic deduplication using
ROW_NUMBER()window functions - Preserved primary key integrity while retaining latest and most complete attribute values
- Ensured rerunnable and idempotent ETL design
- Applied defensive SQL (
NULLIF,COALESCE) to prevent divide-by-zero errors
This reflects production-style analytical data handling rather than academic querying.
- PostgreSQL 16
- CTE-based modular query architecture
- Advanced window functions (
LAG,RANK,SUM OVER) - Rolling averages and YTD calculations
- Defensive SQL (
NULLIF,COALESCE) - Indexing and constraint optimization
- Structured analytics layering (Core → Advanced → Executive)
This project was intentionally designed using production-style data engineering discipline:
- Idempotent and rerunnable transformation scripts
- Deterministic deduplication using
ROW_NUMBER()window functions - Explicit fact table grain definition (Order Line Level)
- Defensive data constraints (CHECK constraints for data integrity)
- Strategic indexing for performance optimization
- Modular query layering (Database Setup → Core Analysis → Advanced Analytics → Executive Insights)
This reflects analytical engineering thinking beyond ad-hoc querying and demonstrates foundational data warehouse and analytics architecture skills.
01_database_setup/ → schema creation, staging, cleaning, constraints
02_core_analysis/ → executive KPIs and dimensional breakdowns
03_advanced_analytics/ → MoM, YoY, rolling averages, Pareto, RFM
04_executive_insights/ → executive summary and strategic recommendations
assets/ → ERD and sample query outputs
To reproduce this analysis:
- Create database in PostgreSQL.
- Run scripts in
01_database_setup/in order. - Validate row counts and constraints.
- Execute modules in
02_core_analysis/. - Run advanced analytics in
03_advanced_analytics/. - Review executive summary in
04_executive_insights/.
All modules are modular and can run independently once base tables are created.
(Replace bracketed numbers with your actual outputs before final push.)
- Technology category drove [X]% of total revenue growth YoY, contributing $[X] incremental revenue.
- Furniture margins averaged [X]%, significantly lower than overall margin of [X]%, driven by discounting.
- Top 20% of customers generated approximately [X]% of total profit, confirming strong revenue concentration.
- Several sub-categories operated below [X]% margin, indicating pricing optimization opportunities.
- RFM segmentation identified [X] high-value repeat customers suitable for retention targeting.
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(sales) AS monthly_revenue,
LAG(SUM(sales)) OVER (ORDER BY DATE_TRUNC('month', order_date)) AS prior_month,
SUM(sales)
- LAG(SUM(sales)) OVER (ORDER BY DATE_TRUNC('month', order_date)) AS mom_change
FROM sales
GROUP BY 1
ORDER BY 1;This demonstrates layered CTE logic, window functions, and trend analysis.
- Fact table grain: Order Line Level
- Revenue = SUM(sales)
- Profit = SUM(profit)
- Margin % = profit / revenue
- Time-based metrics use
order_date - Advanced metrics built using CTE layering and window functions
This project demonstrates:
- Business-focused analytical storytelling
- Advanced SQL proficiency
- Data modeling discipline
- Executive-ready communication
- Senior-level analytical thinking