Skip to content

End-to-end MLOps classification system with AWS CI/CD and automated deployment.

License

Notifications You must be signed in to change notification settings

pankaj2k9/MLOpsE2EClassificationTermProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

35 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  MLOpsE2EClassificationTermProject

This project implements a complete MLOps pipeline for a U.S. Visa Approval Classification System, covering all essential components โ€” from data ingestion to deployment and monitoring. The goal is to predict whether a visa application will be approved or denied, using machine learning and production-grade MLOps tools.

๐ŸŒ Live URL

๐Ÿ”— http://44.203.207.140:8080/

The application is deployed on AWS EC2 using Docker containers, with the image stored and pulled directly from AWS Elastic Container Registry (ECR) through an automated GitHub Actions CI/CD pipeline.


๐Ÿ“˜ Overview

This project demonstrates:

  • Data ingestion & transformation
  • Model training & hyperparameter optimization
  • Model registry and versioning with AWS S3
  • FastAPI deployment
  • Continuous evaluation with Evidently AI

It is designed following end-to-end MLOps best practices, ensuring scalability, reproducibility, and maintainability.


โš™๏ธ Tech Stack

Category Tools / Libraries
Data Processing pandas, numpy, matplotlib, seaborn, plotly
ML Modeling scikit-learn, xgboost, catboost, imblearn, scipy
MLOps & Monitoring dill, PyYAML, neuro_mf, boto3, botocore, mypy-boto3-s3, evidently==0.2.8
Database pymongo
Backend/API fastapi, uvicorn, jinja2, python-multipart
Utilities from_root, certifi, dnspython

๐Ÿ“‚ Project Structure

MLOpsE2EClassificationTermProject/
โ”‚
โ”œโ”€โ”€ data/                        # Raw & processed data
โ”œโ”€โ”€ notebooks/                   # Exploratory analysis notebooks
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ components/              # Data ingestion, transformation, training modules
โ”‚   โ”œโ”€โ”€ pipeline/                # Training & prediction pipelines
โ”‚   โ”œโ”€โ”€ utils/                   # Helper functions
โ”‚   โ”œโ”€โ”€ logger.py                # Custom logging
โ”‚   โ”œโ”€โ”€ exception.py             # Error handling
โ”‚
โ”œโ”€โ”€ app.py                       # FastAPI main application
โ”œโ”€โ”€ template.py                  # Folder structure generator
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ setup.py
โ””โ”€โ”€ README.md

๐Ÿงฉ Installation

1๏ธโƒฃ Create and activate conda environment

conda create -n visa python=3.8 -y
conda activate visa

2๏ธโƒฃ Install dependencies

pip install -r requirements.txt

3๏ธโƒฃ (If MongoDB error occurs)

pip uninstall -y pymongo motor mongoengine djongo
pip install -U "pymongo>=4.7" dnspython certifi

๐Ÿง  Features

๐Ÿงฎ Data Preprocessing

  • Handles missing values and outliers
  • Encodes categorical variables
  • Normalizes numeric features

๐Ÿง  Model Training

  • Trains multiple models (XGBoost, CatBoost, RandomForest, etc.)
  • Uses GridSearchCV for parameter optimization
  • Saves model artifacts with dill

โ˜๏ธ Model Versioning & Storage

  • Stores trained models and metadata in AWS S3
  • Uses boto3 and neuro_mf for version tracking

โšก Deployment via FastAPI

  • REST API endpoint for prediction: /predict
  • Web UI using Jinja2 templates
  • Deployed using Uvicorn

๐Ÿ“Š Continuous Monitoring

  • Integrated with Evidently AI (v0.2.8) for drift detection
  • Tracks model performance and feature drift over time

๐Ÿš€ Usage

๐Ÿงช Run training pipeline

python src/pipeline/training_pipeline.py

โš™๏ธ Start API server

uvicorn app:app --reload

๐Ÿ“ˆ Generate Evidently report

python src/components/data_monitoring.py

๐Ÿงพ Example API Request

POST /predict

{
  "case_id": "A12345",
  "country_of_origin": "India",
  "education_level": "Masters",
  "job_experience": 5,
  "employer_size": 200,
  "prev_visa_denials": 0
}

Response:

{
  "prediction": "Approved",
  "probability": 0.89
}

โ˜๏ธ AWS Integration

Environment Variables

Create a .env file in the root directory:

AWS_ACCESS_KEY_ID=<your_aws_key>
AWS_SECRET_ACCESS_KEY=<your_secret_key>
MONGODB_CLUSTER_URI=<your_mongo_connection_string>
BUCKET_NAME=<your_s3_bucket_name>
AWS_DEFAULT_REGION=<your_aws_region>
ECR_REPO=<your_ecr_url>

๐Ÿงน Troubleshooting

If you face MongoDB issues:

pip uninstall -y pymongo motor mongoengine djongo
pip install -U "pymongo>=4.7" dnspython certifi

If S3 upload fails:

  • Check your AWS credentials
  • Verify IAM role permissions
  • Ensure correct bucket region

๐Ÿ“Š MLOps Pipeline (Flow)

    A[Data Ingestion] --> B[Data Transformation]
    B --> C[Model Training]
    C --> D[Model Evaluation]
    D --> E[Model Storage (AWS S3)]
    E --> F[FastAPI Deployment]
    F --> G[Prediction API]
    G --> H[Monitoring (Evidently AI)]
    H --> A

๐Ÿ“ฆ Requirements Summary

pandas
numpy
matplotlib
plotly
seaborn
scipy
scikit-learn
imblearn
xgboost
catboost
pymongo
from_root
evidently==0.2.8
dill
PyYAML
neuro_mf
boto3
mypy-boto3-s3
botocore
fastapi
uvicorn
jinja2
python-multipart
-e .

๐Ÿ‘จโ€๐Ÿ’ป Author

Pankaj Kumar Pramanik Data, AI & MLOps Engineer ๐ŸŒ pankajpramanik.com

About

End-to-end MLOps classification system with AWS CI/CD and automated deployment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published