This project implements a complete MLOps pipeline for a U.S. Visa Approval Classification System, covering all essential components โ from data ingestion to deployment and monitoring. The goal is to predict whether a visa application will be approved or denied, using machine learning and production-grade MLOps tools.
๐ Live URL
๐ http://44.203.207.140:8080/
The application is deployed on AWS EC2 using Docker containers, with the image stored and pulled directly from AWS Elastic Container Registry (ECR) through an automated GitHub Actions CI/CD pipeline.
This project demonstrates:
- Data ingestion & transformation
- Model training & hyperparameter optimization
- Model registry and versioning with AWS S3
- FastAPI deployment
- Continuous evaluation with Evidently AI
It is designed following end-to-end MLOps best practices, ensuring scalability, reproducibility, and maintainability.
| Category | Tools / Libraries |
|---|---|
| Data Processing | pandas, numpy, matplotlib, seaborn, plotly |
| ML Modeling | scikit-learn, xgboost, catboost, imblearn, scipy |
| MLOps & Monitoring | dill, PyYAML, neuro_mf, boto3, botocore, mypy-boto3-s3, evidently==0.2.8 |
| Database | pymongo |
| Backend/API | fastapi, uvicorn, jinja2, python-multipart |
| Utilities | from_root, certifi, dnspython |
MLOpsE2EClassificationTermProject/
โ
โโโ data/ # Raw & processed data
โโโ notebooks/ # Exploratory analysis notebooks
โโโ src/
โ โโโ components/ # Data ingestion, transformation, training modules
โ โโโ pipeline/ # Training & prediction pipelines
โ โโโ utils/ # Helper functions
โ โโโ logger.py # Custom logging
โ โโโ exception.py # Error handling
โ
โโโ app.py # FastAPI main application
โโโ template.py # Folder structure generator
โโโ requirements.txt
โโโ setup.py
โโโ README.md
conda create -n visa python=3.8 -y
conda activate visapip install -r requirements.txtpip uninstall -y pymongo motor mongoengine djongo
pip install -U "pymongo>=4.7" dnspython certifi- Handles missing values and outliers
- Encodes categorical variables
- Normalizes numeric features
- Trains multiple models (XGBoost, CatBoost, RandomForest, etc.)
- Uses GridSearchCV for parameter optimization
- Saves model artifacts with
dill
- Stores trained models and metadata in AWS S3
- Uses
boto3andneuro_mffor version tracking
- REST API endpoint for prediction:
/predict - Web UI using Jinja2 templates
- Deployed using Uvicorn
- Integrated with Evidently AI (v0.2.8) for drift detection
- Tracks model performance and feature drift over time
python src/pipeline/training_pipeline.pyuvicorn app:app --reloadpython src/components/data_monitoring.py{
"case_id": "A12345",
"country_of_origin": "India",
"education_level": "Masters",
"job_experience": 5,
"employer_size": 200,
"prev_visa_denials": 0
}Response:
{
"prediction": "Approved",
"probability": 0.89
}Create a .env file in the root directory:
AWS_ACCESS_KEY_ID=<your_aws_key>
AWS_SECRET_ACCESS_KEY=<your_secret_key>
MONGODB_CLUSTER_URI=<your_mongo_connection_string>
BUCKET_NAME=<your_s3_bucket_name>
AWS_DEFAULT_REGION=<your_aws_region>
ECR_REPO=<your_ecr_url>
If you face MongoDB issues:
pip uninstall -y pymongo motor mongoengine djongo
pip install -U "pymongo>=4.7" dnspython certifiIf S3 upload fails:
- Check your AWS credentials
- Verify IAM role permissions
- Ensure correct bucket region
A[Data Ingestion] --> B[Data Transformation]
B --> C[Model Training]
C --> D[Model Evaluation]
D --> E[Model Storage (AWS S3)]
E --> F[FastAPI Deployment]
F --> G[Prediction API]
G --> H[Monitoring (Evidently AI)]
H --> A
pandas
numpy
matplotlib
plotly
seaborn
scipy
scikit-learn
imblearn
xgboost
catboost
pymongo
from_root
evidently==0.2.8
dill
PyYAML
neuro_mf
boto3
mypy-boto3-s3
botocore
fastapi
uvicorn
jinja2
python-multipart
-e .
Pankaj Kumar Pramanik Data, AI & MLOps Engineer ๐ pankajpramanik.com