Welcome to the Vehicle Price Prediction MLOps project! This repository demonstrates a full-fledged Machine Learning workflow with seamless integration of:
- βοΈ Automation
- π¦ Packaging
- π§ͺ Model Training
- π Data Pipelines
- βοΈ Cloud Integration
- π³ Dockerization
- π CI/CD with GitHub Actions & AWS
Goal: Build a production-grade ML pipeline that predicts vehicle prices using modern DevOps principles.
+-------------------+
| Template Creation |
+--------+----------+
β
+--------v----------+
| Environment Setup |
+--------+----------+
β
+--------v----------+
| MongoDB Integration |
+--------+----------+
β
+------------------------v-------------------------------+
| Data Pipelines (Ingestion, Validation, Transformation) |
+------------------------+-------------------------------+
β
+--------------------v------------------------+
| Model Training + Evaluation + Pushing to S3 |
+--------------------+------------------------+
β
+------------------v------------------+
| Flask API + Docker + EC2 Deployment |
+-------------------------------------+
- β
Local package creation using
setup.pyandpyproject.toml - β MongoDB Atlas integration for scalable cloud-based storage
- β
Modular codebase with
src/architecture and reusable configs - β Data ingestion, validation, transformation pipelines
- β Model training, evaluation, and deployment logic
- β AWS S3 model storage with versioning
- β Real-time prediction API served with Flask
- β CI/CD using Docker, GitHub Actions, ECR, and EC2
- β
Fully automated deployment on port
5080with custom domain support
| Domain | Tools / Services Used |
|---|---|
| Programming | Python 3.10 |
| Package Management | pip, conda |
| Data Storage | MongoDB Atlas |
| Data Handling | pandas, PyYAML |
| Model Training | scikit-learn, custom estimators |
| Cloud Services | AWS S3, IAM, EC2, ECR |
| Deployment | Flask, Docker, GitHub Actions |
| CI/CD | GitHub Actions, Self-hosted Runner on EC2 |
.
βββ .github/workflows/
β βββ aws.yaml # CI/CD workflow
βββ src/
β βββ components/ # Data ingestion, validation, transformation etc.
β βββ configuration/ # MongoDB & AWS connections
β βββ data_access/ # MongoDB data fetch logic
β βββ entity/ # Config and artifact entities
β βββ aws_storage/ # AWS S3 integration
βββ notebook/ # EDA and MongoDB demo
βββ templates/ & static/ # For Flask App
βββ dockerfile # Docker setup
βββ app.py # Flask application
βββ requirements.txt
βββ pyproject.toml & setup.py # Local package management
βββ README.md
python template.py
conda create -n vehicle python=3.10 -y
conda activate vehicle
pip install -r requirements.txtpip list # Confirm local package installation- Create project and cluster (M0 tier)
- Create DB user and whitelist
0.0.0.0/0 - Copy the Python connection string (replace
<password>)
# Mac/Linux
export MONGODB_URL="mongodb+srv://<user>:<pass>@cluster.mongodb.net/..."
# Windows PowerShell
$env:MONGODB_URL = "mongodb+srv://<user>:<pass>@cluster.mongodb.net/..."- Fetch raw data from MongoDB
- Convert to pandas DataFrame
- Store artifacts for next stage
- Schema validation via
schema.yaml - Check for missing/null/incorrect data
- Feature engineering
- Convert raw features into model-ready format
- Custom training logic using
sklearn - Save best model artifact
- Compare old vs. new model
- Upload latest model to AWS S3 if performance improves
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."- Region:
us-east-1 - Bucket:
my-model-mlopsproj
- Configure
aws.yamlunder.github/workflows - Build Docker Image β Push to AWS ECR
- Launch EC2 Instance β Connect EC2 to GitHub Runner
- Auto-deploy Flask App to port
5000
Once deployed:
http://<EC2-PUBLIC-IP>:5000/
Use /training route to manually trigger model training:
http://<EC2-PUBLIC-IP>:5000/training
| Name | Description |
|---|---|
AWS_ACCESS_KEY_ID |
AWS Access Key |
AWS_SECRET_ACCESS_KEY |
AWS Secret Key |
AWS_DEFAULT_REGION |
Default AWS Region (us-east-1) |
ECR_REPO |
Docker Repo URI from AWS ECR |
- β Add model drift detection
- β Add monitoring with Prometheus/Grafana
- β Integrate GitHub issues via bot
- β³ Add frontend UI for better UX
- β³ Switch to Terraform for infrastructure provisioning
Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License.
Aditya Arya
Machine Learning Engineer | MLOps Enthusiast
β If you like this project, give it a star on GitHub!