Skip to content

AdArya125/MLOPS-Project-1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš— Vehicle Price Prediction - End-to-End MLOps Project

Welcome to the Vehicle Price Prediction MLOps project! This repository demonstrates a full-fledged Machine Learning workflow with seamless integration of:

  • βš™οΈ Automation
  • πŸ“¦ Packaging
  • πŸ§ͺ Model Training
  • πŸ“Š Data Pipelines
  • ☁️ Cloud Integration
  • 🐳 Dockerization
  • πŸ” CI/CD with GitHub Actions & AWS

Goal: Build a production-grade ML pipeline that predicts vehicle prices using modern DevOps principles.


🧱 Project Architecture

                        +-------------------+
                        | Template Creation |
                        +--------+----------+
                                 ↓
                        +--------v----------+
                        | Environment Setup |
                        +--------+----------+
                                 ↓
                        +--------v----------+
                        | MongoDB Integration |
                        +--------+----------+
                                 ↓
        +------------------------v-------------------------------+
        | Data Pipelines (Ingestion, Validation, Transformation) |
        +------------------------+-------------------------------+
                                 ↓
            +--------------------v------------------------+
            | Model Training + Evaluation + Pushing to S3 |
            +--------------------+------------------------+
                                 ↓
              +------------------v------------------+
              | Flask API + Docker + EC2 Deployment |
              +-------------------------------------+

πŸš€ Features

  • βœ… Local package creation using setup.py and pyproject.toml
  • βœ… MongoDB Atlas integration for scalable cloud-based storage
  • βœ… Modular codebase with src/ architecture and reusable configs
  • βœ… Data ingestion, validation, transformation pipelines
  • βœ… Model training, evaluation, and deployment logic
  • βœ… AWS S3 model storage with versioning
  • βœ… Real-time prediction API served with Flask
  • βœ… CI/CD using Docker, GitHub Actions, ECR, and EC2
  • βœ… Fully automated deployment on port 5080 with custom domain support

πŸ› οΈ Tech Stack

Domain Tools / Services Used
Programming Python 3.10
Package Management pip, conda
Data Storage MongoDB Atlas
Data Handling pandas, PyYAML
Model Training scikit-learn, custom estimators
Cloud Services AWS S3, IAM, EC2, ECR
Deployment Flask, Docker, GitHub Actions
CI/CD GitHub Actions, Self-hosted Runner on EC2

πŸ“ Directory Structure

.
β”œβ”€β”€ .github/workflows/
β”‚   └── aws.yaml                # CI/CD workflow
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ components/             # Data ingestion, validation, transformation etc.
β”‚   β”œβ”€β”€ configuration/          # MongoDB & AWS connections
β”‚   β”œβ”€β”€ data_access/            # MongoDB data fetch logic
β”‚   β”œβ”€β”€ entity/                 # Config and artifact entities
β”‚   β”œβ”€β”€ aws_storage/            # AWS S3 integration
β”œβ”€β”€ notebook/                   # EDA and MongoDB demo
β”œβ”€β”€ templates/ & static/        # For Flask App
β”œβ”€β”€ dockerfile                  # Docker setup
β”œβ”€β”€ app.py                      # Flask application
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ pyproject.toml & setup.py   # Local package management
└── README.md

πŸ§ͺ Setup Guide

πŸ”Ή Create Project Template & Environment

python template.py
conda create -n vehicle python=3.10 -y
conda activate vehicle
pip install -r requirements.txt

πŸ”Ή Setup Local Packages

pip list  # Confirm local package installation

πŸ”Ή MongoDB Atlas Integration

  • Create project and cluster (M0 tier)
  • Create DB user and whitelist 0.0.0.0/0
  • Copy the Python connection string (replace <password>)

πŸ”Ή Setup Environment Variables

# Mac/Linux
export MONGODB_URL="mongodb+srv://<user>:<pass>@cluster.mongodb.net/..."

# Windows PowerShell
$env:MONGODB_URL = "mongodb+srv://<user>:<pass>@cluster.mongodb.net/..."

πŸ“ˆ ML Pipeline Components

πŸ“₯ Data Ingestion

  • Fetch raw data from MongoDB
  • Convert to pandas DataFrame
  • Store artifacts for next stage

πŸ§ͺ Data Validation

  • Schema validation via schema.yaml
  • Check for missing/null/incorrect data

πŸ”„ Data Transformation

  • Feature engineering
  • Convert raw features into model-ready format

πŸ€– Model Training

  • Custom training logic using sklearn
  • Save best model artifact

πŸ§ͺ Model Evaluation & S3 Upload

  • Compare old vs. new model
  • Upload latest model to AWS S3 if performance improves

☁️ AWS Cloud Integration

βœ… Setup AWS Credentials

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."

βœ… S3 Bucket

  • Region: us-east-1
  • Bucket: my-model-mlopsproj

🐳 CI/CD Pipeline with GitHub Actions & Docker

  1. Configure aws.yaml under .github/workflows
  2. Build Docker Image β†’ Push to AWS ECR
  3. Launch EC2 Instance β†’ Connect EC2 to GitHub Runner
  4. Auto-deploy Flask App to port 5000

🌐 Access the Application

Once deployed:

http://<EC2-PUBLIC-IP>:5000/

Use /training route to manually trigger model training:

http://<EC2-PUBLIC-IP>:5000/training

πŸ”’ Secrets Used in GitHub Actions

Name Description
AWS_ACCESS_KEY_ID AWS Access Key
AWS_SECRET_ACCESS_KEY AWS Secret Key
AWS_DEFAULT_REGION Default AWS Region (us-east-1)
ECR_REPO Docker Repo URI from AWS ECR

πŸ“Œ Future Enhancements

  • βœ… Add model drift detection
  • βœ… Add monitoring with Prometheus/Grafana
  • βœ… Integrate GitHub issues via bot
  • ⏳ Add frontend UI for better UX
  • ⏳ Switch to Terraform for infrastructure provisioning

πŸ™Œ Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.


πŸ“œ License

This project is licensed under the MIT License.


πŸ‘€ Author

Aditya Arya
Machine Learning Engineer | MLOps Enthusiast


⭐ If you like this project, give it a star on GitHub!

About

This is a complete end to end project on MLOPS, using RandomForest Classifier, for Vehicle Insurance Prediction. Implements CI/CD, Containerization (Docker), MongoDB, and AWs services like IAM, S3, EC2, ECR.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors