This project is the final project of the Data Warehouses Course, which aims to simulate a comprehensive, real-world data engineering workflow. The project covers all stages from data ingestion to the design and implementation of a data warehouse. The focus is on building a PostgreSQL-based data warehouse, transforming raw data into meaningful insights, and generating statistical reports.
In this project, I extracted data from external sources(dataset), transformed it using Python, and structured it into a star schema within PostgreSQL. The project not only showcases the technical aspects of data ingestion and transformation but also demonstrates the ability to generate insightful analytics from the warehouse.
- Clone this repository.
- Install dependencies:
pip install -r requirements.txt. - Run the scripts in the following order:
scripts/ingestion.pyscripts/transformation.py
The data warehouse uses a star schema with the following tables:
- Fact Table:
fact_reviews - Dimension Tables:
dim_hosts,dim_dates,dim_listings
- Clone the repository:
git clone https://github.com/malakShehada/Data_Warehouse_Final_Project.git
Navigate to the project directory:
cd Data_Warehouse_Final_Project
Install required Python packages:
pip install -r requirements.txt
Run the scripts in the following order: ingestion.py to ingest raw data. transformation.py to clean and transform the data.
Key insights can be found in reports/stats_report.md.