Master of Science in Data Science β Capstone Course
A culminating project demonstrating business strategy, data modeling, and technical implementation across a real-world use case.
This repository contains the full deliverables for our MSDS Capstone project, completed as part of the final capstone requirement for the Master of Science in Data Science program. The project integrates skills across the three core pillars of the MSDS curriculum:
| Pillar | Focus Areas |
|---|---|
| π Business | Strategic thinking, consulting, stakeholder communication, business planning |
| π€ Modeling | Statistical analysis, machine learning, model evaluation, and insights |
| π» Information Technology | Data pipelines, system design, implementation, and deployment |
Briefly describe the industry, problem statement, and business context here.
- Industry:
[e.g., Healthcare / Finance / Retail / etc.] - Problem Statement:
[1β2 sentences describing the core business problem] - Strategic Objective:
[What competitive or operational advantage does this project deliver?]
| Name | Role |
|---|---|
Name |
Project Lead / Business Strategy |
Name |
Data Engineer / Pipeline Development |
Name |
ML Modeling & Evaluation |
Name |
Visualization & Communication |
π¦ capstone-project/
βββ π data/
β βββ raw/ # Original, unmodified data sources
β βββ processed/ # Cleaned and transformed datasets
β βββ external/ # Third-party or supplementary data
βββ π notebooks/
β βββ 01_eda.ipynb # Exploratory Data Analysis
β βββ 02_preprocessing.ipynb
β βββ 03_modeling.ipynb
β βββ 04_evaluation.ipynb
βββ π src/
β βββ data/ # Data ingestion and processing scripts
β βββ models/ # Model training and inference code
β βββ utils/ # Helper functions and utilities
βββ π reports/
β βββ business_plan.pdf # Business case and strategic plan
β βββ implementation_plan.pdf
β βββ final_presentation.pdf
βββ π dashboards/ # Visualization and reporting artifacts
βββ requirements.txt
βββ environment.yml
βββ README.md
- Business Understanding β Defined the problem scope, KPIs, and success criteria in collaboration with stakeholders.
- Data Acquisition & Engineering β Identified, collected, and built pipelines for all relevant data sources.
- Exploratory Data Analysis β Uncovered patterns, anomalies, and key relationships within the data.
- Modeling β Developed, trained, and iterated on predictive/analytical models.
- Evaluation β Assessed model performance against business-defined success metrics.
- Implementation Planning β Outlined a deployment strategy, organizational considerations, and scalability roadmap.
- Communication β Delivered findings to both technical and non-technical audiences.
Summarize your primary findings and outcomes here.
[Result 1 β e.g., Achieved XX% accuracy on holdout set][Result 2 β e.g., Identified $XM in potential cost savings][Result 3 β e.g., Reduced processing time by XX%]
| Category | Tools |
|---|---|
| Languages | Python, SQL |
| Data Processing | Pandas, NumPy, PySpark |
| Modeling | Scikit-learn, XGBoost, TensorFlow / PyTorch |
| Visualization | Matplotlib, Seaborn, Plotly, Tableau |
| Infrastructure | AWS / GCP / Azure, Docker |
| Version Control | Git, GitHub |
- Python 3.9+
- Conda or virtualenv
# Clone the repository
git clone https://github.com/your-org/capstone-project.git
cd capstone-project
# Create and activate environment
conda env create -f environment.yml
conda activate capstone
# Or using pip
pip install -r requirements.txt# Run data preprocessing
python src/data/preprocess.py
# Train the model
python src/models/train.py
# Launch the dashboard (if applicable)
python dashboards/app.py- Business Plan
- Project Implementation Plan
- Exploratory Data Analysis Report
- Model Documentation
- Final Presentation Deck
- Executive Summary
We would like to thank our course instructors, program faculty, and any industry partners or mentors who supported this project throughout the MSDS Capstone course (Section 55).
For questions or collaboration inquiries, please reach out to the project team via GitHub Issues or the contact information below.
| Team Member | |
|---|---|
Joshua Pasaye |
joshuapasaye2027@u.northwestern.edu |
This project was completed in partial fulfillment of the requirements for the Master of Science in Data Science program.