This repository contains a beta version of an End-to-End Machine Learning Lifecycle pipeline, designed as an introduction to MLOps and production-grade machine learning workflows.
The project focuses on binary classification and is intended as a learning platform and starting point for more advanced projects in the future. The ultimate goal of this pipeline is to streamline and automate the full machine learning lifecycle: from training and evaluating models to serving predictions and managing retraining.
-
Ingest and preprocess datasets
-
Split data into training and test sets
-
Train and evaluate models
-
Track experiments and model versions
-
Serve predictions with metadata logging
-
Trigger retraining based on configurable thresholds
-
Orchestrate all steps with a modular workflow
-
Docker support for reproducibility
mlops_beta/
├── data/
│ ├── raw/ # Original datasets
│ └── processed/ # Preprocessed datasets
├── artifacts/
│ ├── models/ # Trained model artifacts
│ │ └── <hash_or_version>/
│ ├── preprocessing/ # Serialized preprocessing pipelines
│ └── predictions/ # Output predictions with metadata
├── experiments/
│ ├── <experiment_id>.yaml # Experiment configuration
│ └── metadata.sqlite # Experiment metadata storage
├── configs/
│ ├── data_config.yaml # Dataset and preprocessing settings
│ ├── model_config.yaml # Model architecture and hyperparameters
│ └── inference_config.yaml # Serving and inference settings
├── dags/
│ └── ml_pipeline.py # Pipeline orchestration
├── src/
│ ├── preprocessing.py # Preprocessing functions
│ ├── training.py # Model training logic
│ ├── inference.py # Serving / inference logic
│ └── utils.py # Utility functions
├── requirements.txt # Python dependencies
├── environment.yaml # Conda environment for reproducibility
├── Dockerfile # Optional Docker setup
└── README.md # This file
# Option 1: Using pip
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Option 2: Using conda
conda env create -f environment.yaml
conda activate mlops_beta
# Optional: Docker for full reproducibility
docker build -t mlops_beta:latest .
docker run -it --rm mlops_beta:latest-
Place your raw dataset in data/raw/.
-
Adjust settings in configs/data_config.yaml and configs/model_config.yaml.
-
Run the pipeline:
python dags/ml_pipeline.pyThe pipeline will:
-
Preprocess data and split into train/test sets
-
Train and evaluate the model
-
Save artifacts and experiment metadata
-
Serve predictions with logging
-
Generate retraining signals if thresholds are exceeded
This is a beta version intended for learning and experimentation. Future versions will include:
-
Support for multi-class classification and regression
-
GPU-accelerated training
-
REST API and streaming data support
-
Advanced hyperparameter optimization
-
Real-time monitoring and drift detection
-
Deployment-ready containers and cloud integration
MIT License