SQL Data Warehouse with Postgres 🗄️

Welcome to my Data Warehouse and Analytics Project! 🚀

This repository showcases an end-to-end data warehouse built from scratch using PostgreSQL, adapted from Baraa Salkini's SQL Server-based YouTube tutorial (https://youtu.be/9GVqKuTVANE?si=HATxStdwdLw7oyb1). It demonstrates modern data engineering practices, including a medallion architecture, ETL pipelines, and analytical queries, tailored for Postgres. The CSV files, data flow and schema diagrams are works of Baraa Salkini. They are for educational use only.

🏗️ Project Overview

This project builds a scalable data warehouse to consolidate and analyze sales data. Key components include:

Data Architecture: Implements a medallion architecture with Bronze, Silver, and Gold layers.
- Bronze: Raw data ingested from CSV files (ERP and CRM sources).
- Silver: Cleansed, standardized, and normalized data for processing.
- Gold: Star-schema data model optimized for analytics.
ETL Pipelines: Scripts for extracting, transforming, and loading data into Postgres.
Data Modeling: Fact and dimension tables designed for efficient querying.
Analytics: SQL queries to uncover insights on customer behavior/details, product performance, and sales trends.

🎯 Key Features

Adapted SQL Server scripts to PostgreSQL.
Optimized query performance for large datasets using Postgres indexes and partitioning.
Implemented a star schema in the Gold layer for business-ready analytics.
Clean, modular SQL scripts with clear documentation.

This project is ideal for showcasing skills in:

Data Engineering (ETL pipelines, data warehousing)
SQL Development (Postgres-specific optimizations)
Data Modeling (star schema, medallion architecture)
Data Analytics (actionable insights)

🛠️ Tools & Technologies

PostgreSQL: Database for hosting the warehouse.
DBeaver: GUI for database management.
GitHub: Version control and project hosting.

📂 Repository Structure

data-warehouse-project/
│
├── datasets/                           # Raw datasets used for the project (ERP and CRM data)
│
├── docs/                               # Project documentation and architecture details
│   ├── etl.drawio                      # Draw.io file shows all different techniquies and methods of ETL
│   ├── data_architecture.drawio        # Draw.io file shows the project's architecture
│   ├── data_catalog.md                 # Catalog of datasets, including field descriptions and metadata
│   ├── data_flow.drawio                # Draw.io file for the data flow diagram
│   ├── data_models.drawio              # Draw.io file for data models (star schema)
│   ├── naming-conventions.md           # Consistent naming guidelines for tables, columns, and files
│
├── scripts/                            # SQL scripts for ETL and transformations
│   ├── bronze/                         # Scripts for extracting and loading raw data
│   ├── silver/                         # Scripts for cleaning and transforming data
│   ├── gold/                           # Scripts for creating analytical models
│
├── tests/                              # Test scripts and quality files
│
├── README.md                           # Project overview and instructions
├── LICENSE                             # License information for the repository
├── .gitignore                          # Files and directories to be ignored by Git
└── requirements.txt                    # Dependencies and requirements for the project

🚀 Getting Started

Prerequisites:
- Install PostgreSQL and a GUI.
- Clone this repository.
Setup:
- Create a Postgres database: CREATE DATABASE warehouse;.
- Run scripts in order: bronze/*.sql, silver/*.sql, gold/*.sql.
- Load sample data from datasets/ (see docs/data_catalog.md for details).
Explore:
- Run analytical queries in gold/queries.sql for insights.
- View diagrams in docs/ for architecture and data flow.

For detailed requirements, see docs/requirements.md.

🔍 Postgres Adaptation

This project was adapted from Baraa's project which was in SQL Server to Postgres, addressing challenges like:

Converting T-SQL’s IDENTITY to Postgres SERIAL.
Adjusting date/time functions (e.g., GETDATE() to NOW()).
Optimizing queries with Postgres-specific indexing.

🙏 Acknowledgments

Inspired by Baraa Salkini’s SQL Data Warehouse tutorial.
Datasets provided by the tutorial, used with permission for educational purposes.

☕ Connect

Check out my LinkedIn to follow my data engineering journey!
Feel free to star this repo, share feedback, or suggest improvements.

License: MIT Note: Sample datasets are for demonstration only. Ensure compliance with data usage policies before sharing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SQL Data Warehouse with Postgres 🗄️

🏗️ Project Overview

🎯 Key Features

🛠️ Tools & Technologies

📂 Repository Structure

🚀 Getting Started

🔍 Postgres Adaptation

🙏 Acknowledgments

☕ Connect

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
datasets		datasets
docs		docs
scripts		scripts
tests		tests
.gitignore		.gitignore
FUNDING.yml		FUNDING.yml
LICENSE		LICENSE
README.md		README.md

License

Rezmassi/SQL-Data-Warehouse-Project

Folders and files

Latest commit

History

Repository files navigation

SQL Data Warehouse with Postgres 🗄️

🏗️ Project Overview

🎯 Key Features

🛠️ Tools & Technologies

📂 Repository Structure

🚀 Getting Started

🔍 Postgres Adaptation

🙏 Acknowledgments

☕ Connect

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages