This project simulates a real-world ETL pipeline for cleaning, merging, and auditing customer data from multiple sources.
- Microsoft SQL Server (Docker)
- T-SQL (
sqlcmd) - Bulk insert from CSV
- Stored Procedures for automation
data/: Sample input data (customer_data.csv)sql/: SQL scripts for table creation, ETL logic, and testingscreenshots/: Optional screenshots or dashboard previewsREADME.md: Project overview
- Load raw customer data into
Staging_Customers - Clean inconsistent city names (e.g.
new yorkβNew York) - Merge data into
Dim_Customersusing UPSERT logic - Log each ETL run in
ETL_Job_Log - Clear staging table
- Active vs inactive customer count
- Top cities by customer count
- ETL run history
# Connect to the DB
sqlcmd -S localhost -U SA -P 'YourStrongPassword' -d Company_Datawarehouse
# Create tables
:i sql/create_tables.sql
# Load your CSV via BULK INSERT or bcp
# Run ETL
EXEC Run_Customer_ETL;
GO
# Run test queries
:i sql/test_queries.sql