Skip to content

berhaneh24/SQL_Superstore_Business_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL Superstore Business Analysis (PostgreSQL 16)

This project demonstrates senior-level SQL analytics, modular query architecture, and executive-ready business storytelling using PostgreSQL 16.


Project Overview

This project simulates a real-world retail business analytics engagement using the Superstore transactional dataset.

The objective is to transform raw order-level data into executive-ready business insights through structured data modeling, defensive SQL practices, and advanced analytical techniques.

All analysis was performed using PostgreSQL 16 and pgAdmin 4.


Dataset

  • Source: Superstore transactional dataset
  • ~9,994 raw staging records
  • 1,862 cleaned order-line records after transformation
  • Fact grain: Order Line Level

Business Objectives

  • Evaluate revenue growth and profitability trends (MoM, YoY, YTD)
  • Identify margin drivers and loss-leading categories
  • Analyze regional and shipping performance
  • Determine revenue and profit concentration (Pareto 80/20)
  • Segment customers using RFM methodology
  • Generate executive-level strategic recommendations

Key Business Questions

  • What are total revenue, profit, margin %, and order volume?
  • Which categories and sub-categories drive profit vs. losses?
  • How is revenue trending month-over-month and year-over-year?
  • Are profits concentrated in a small percentage of customers or products?
  • Which customers are high-value vs. at-risk?

Data Quality Handling

During transformation, several real-world data integrity issues were identified and resolved:

  • Detected inconsistent product_id → product_name mappings across 32 product IDs
  • Implemented deterministic deduplication using ROW_NUMBER() window functions
  • Preserved primary key integrity while retaining latest and most complete attribute values
  • Ensured rerunnable and idempotent ETL design
  • Applied defensive SQL (NULLIF, COALESCE) to prevent divide-by-zero errors

This reflects production-style analytical data handling rather than academic querying.


Technical Highlights

  • PostgreSQL 16
  • CTE-based modular query architecture
  • Advanced window functions (LAG, RANK, SUM OVER)
  • Rolling averages and YTD calculations
  • Defensive SQL (NULLIF, COALESCE)
  • Indexing and constraint optimization
  • Structured analytics layering (Core → Advanced → Executive)

Architectural & Interview Talking Points

This project was intentionally designed using production-style data engineering discipline:

  • Idempotent and rerunnable transformation scripts
  • Deterministic deduplication using ROW_NUMBER() window functions
  • Explicit fact table grain definition (Order Line Level)
  • Defensive data constraints (CHECK constraints for data integrity)
  • Strategic indexing for performance optimization
  • Modular query layering (Database Setup → Core Analysis → Advanced Analytics → Executive Insights)

This reflects analytical engineering thinking beyond ad-hoc querying and demonstrates foundational data warehouse and analytics architecture skills.


Repository Structure

01_database_setup/ → schema creation, staging, cleaning, constraints
02_core_analysis/ → executive KPIs and dimensional breakdowns
03_advanced_analytics/ → MoM, YoY, rolling averages, Pareto, RFM
04_executive_insights/ → executive summary and strategic recommendations
assets/ → ERD and sample query outputs


Execution Order

To reproduce this analysis:

  1. Create database in PostgreSQL.
  2. Run scripts in 01_database_setup/ in order.
  3. Validate row counts and constraints.
  4. Execute modules in 02_core_analysis/.
  5. Run advanced analytics in 03_advanced_analytics/.
  6. Review executive summary in 04_executive_insights/.

All modules are modular and can run independently once base tables are created.


Sample Executive Insights

(Replace bracketed numbers with your actual outputs before final push.)

  • Technology category drove [X]% of total revenue growth YoY, contributing $[X] incremental revenue.
  • Furniture margins averaged [X]%, significantly lower than overall margin of [X]%, driven by discounting.
  • Top 20% of customers generated approximately [X]% of total profit, confirming strong revenue concentration.
  • Several sub-categories operated below [X]% margin, indicating pricing optimization opportunities.
  • RFM segmentation identified [X] high-value repeat customers suitable for retention targeting.

Example Advanced SQL (Window Function Usage)

SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(sales) AS monthly_revenue,
    LAG(SUM(sales)) OVER (ORDER BY DATE_TRUNC('month', order_date)) AS prior_month,
    SUM(sales) 
        - LAG(SUM(sales)) OVER (ORDER BY DATE_TRUNC('month', order_date)) AS mom_change
FROM sales
GROUP BY 1
ORDER BY 1;

This demonstrates layered CTE logic, window functions, and trend analysis.


Data Model Notes

  • Fact table grain: Order Line Level
  • Revenue = SUM(sales)
  • Profit = SUM(profit)
  • Margin % = profit / revenue
  • Time-based metrics use order_date
  • Advanced metrics built using CTE layering and window functions

Strategic Value

This project demonstrates:

  • Business-focused analytical storytelling
  • Advanced SQL proficiency
  • Data modeling discipline
  • Executive-ready communication
  • Senior-level analytical thinking

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors