Skip to content

Mohith-akash/olist-analytics-platform

Repository files navigation

🛒 Olist E-commerce Analytics Platform

Lakehouse analytics with Databricks, Delta Lake & Streamlit

Streamlit Databricks Delta Lake Python CI

Live Dashboard · Dataset


🎯 Overview

A complete analytics platform analyzing 100,000+ orders from Brazilian e-commerce marketplace Olist (2016-2018). Built to demonstrate:

  • Lakehouse Architecture - Databricks with Delta Lake storage
  • Medallion Pattern - Bronze → Silver → Gold data layers
  • SQL Expertise - Complex transformations, CTEs, JOINs
  • Data Visualization - Interactive Streamlit dashboard
  • CI/CD - GitHub Actions for linting and testing

📊 Dashboard Preview

KPIs & Insights

Dashboard Hero

Charts & Analytics

Dashboard Charts


🛠️ Tech Stack


Lakehouse Platform

Storage Format

Web Dashboard

Backend

🏗️ Architecture

                    ┌─────────────────────────────────────────────────────┐
                    │              Databricks Lakehouse                   │
                    │                                                     │
CSV Files ─────────►│  ┌──────────┐   ┌──────────┐   ┌──────────────┐    │
                    │  │  Bronze  │──►│  Silver  │──►│    Gold      │    │
                    │  │  (raw)   │   │ (clean)  │   │ (analytics)  │    │
                    │  │ 9 tables │   │ 7 tables │   │  4 tables    │    │
                    │  └──────────┘   └──────────┘   └──────────────┘    │
                    │                                       │            │
                    │              Delta Lake Storage       │            │
                    └───────────────────────────────────────┼────────────┘
                                                            │
                                                            ▼
                                                    ┌──────────────┐
                                                    │  Streamlit   │
                                                    │  Dashboard   │
                                                    └──────────────┘

Medallion Architecture

Layer Tables Description
Bronze 9 tables Raw data ingested from CSV files
Silver 7 tables Cleaned, typed, and validated data
Gold 4 tables Business-ready facts and dimensions

Data Models (Gold Layer)

Model Description
fct_orders Order facts with revenue metrics
dim_customers Customer dimension with segmentation
dim_products Product dimension with sales tiers
dim_sellers Seller dimension with performance ratings

🚀 Quick Start

1. Clone & Setup Environment

git clone https://github.com/Mohith-akash/olist-analytics-platform.git
cd olist-analytics-platform

python -m venv venv
.\venv\Scripts\activate        # Windows
source venv/bin/activate       # Mac/Linux

pip install -r requirements.txt

2. Configure Databricks Connection

Create .streamlit/secrets.toml:

DATABRICKS_HOST = "your-workspace.cloud.databricks.com"
DATABRICKS_HTTP_PATH = "/sql/1.0/warehouses/your-warehouse-id"
DATABRICKS_TOKEN = "your-access-token"

3. Run the Dashboard

streamlit run streamlit_app.py

📁 Project Structure

olist_analytics_platform/
├── 📊 streamlit_app.py              # Dashboard entry point
├── 📋 requirements.txt              # Python dependencies
│
├── 📂 app/                          # Core modules
│   ├── database.py                  # Databricks SQL connection
│   ├── styles.py                    # CSS styling
│   └── utils.py                     # Formatting utilities
│
├── 📂 tabs/                         # Dashboard components
│   ├── home.py                      # KPIs and overview
│   ├── analytics.py                 # Analysis charts
│   ├── query.py                     # Data explorer
│   └── about.py                     # Project info
│
├── 📂 databricks/                   # SQL notebooks (reference)
│   ├── 01_bronze_layer.sql
│   ├── 02_silver_layer.sql
│   └── 03_gold_layer.sql
│
└── 📂 docs/images/                  # Screenshots

📚 Dataset

Olist Brazilian E-commerce Dataset 100K+ orders · 9 tables · 2016-2018 Kaggle


Built by Mohith Akash

⭐ Star this repo if you found it helpful!

About

End-to-end analytics platform: CSV → Databricks → Delta Lake → Streamlit Dashboard | 100K+ Brazilian e-commerce orders

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages