Skip to content

A beta project of automating training and inference for models with constant monitoring, the beta only supports binary classification models

Notifications You must be signed in to change notification settings

Sycritz/e2e-ml-lifecycle-beta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

End-to-End Machine Learning Lifecycle (Beta)

Overview

This repository contains a beta version of an End-to-End Machine Learning Lifecycle pipeline, designed as an introduction to MLOps and production-grade machine learning workflows.

The project focuses on binary classification and is intended as a learning platform and starting point for more advanced projects in the future. The ultimate goal of this pipeline is to streamline and automate the full machine learning lifecycle: from training and evaluating models to serving predictions and managing retraining.

Features

  • Ingest and preprocess datasets

  • Split data into training and test sets

  • Train and evaluate models

  • Track experiments and model versions

  • Serve predictions with metadata logging

  • Trigger retraining based on configurable thresholds

  • Orchestrate all steps with a modular workflow

  • Docker support for reproducibility

Folder Structure

mlops_beta/
├── data/
│   ├── raw/                   # Original datasets
│   └── processed/             # Preprocessed datasets
├── artifacts/
│   ├── models/                # Trained model artifacts
│   │   └── <hash_or_version>/
│   ├── preprocessing/         # Serialized preprocessing pipelines
│   └── predictions/           # Output predictions with metadata
├── experiments/
│   ├── <experiment_id>.yaml   # Experiment configuration
│   └── metadata.sqlite        # Experiment metadata storage
├── configs/
│   ├── data_config.yaml       # Dataset and preprocessing settings
│   ├── model_config.yaml      # Model architecture and hyperparameters
│   └── inference_config.yaml  # Serving and inference settings
├── dags/
│   └── ml_pipeline.py         # Pipeline orchestration
├── src/
│   ├── preprocessing.py       # Preprocessing functions
│   ├── training.py            # Model training logic
│   ├── inference.py           # Serving / inference logic
│   └── utils.py               # Utility functions
├── requirements.txt           # Python dependencies
├── environment.yaml           # Conda environment for reproducibility
├── Dockerfile                 # Optional Docker setup
└── README.md                  # This file

Getting Started

1. Environment Setup

# Option 1: Using pip
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Option 2: Using conda
conda env create -f environment.yaml
conda activate mlops_beta

# Optional: Docker for full reproducibility
docker build -t mlops_beta:latest .
docker run -it --rm mlops_beta:latest

2. Running the Pipeline

  1. Place your raw dataset in data/raw/.

  2. Adjust settings in configs/data_config.yaml and configs/model_config.yaml.

  3. Run the pipeline:

python dags/ml_pipeline.py

The pipeline will:

  • Preprocess data and split into train/test sets

  • Train and evaluate the model

  • Save artifacts and experiment metadata

  • Serve predictions with logging

  • Generate retraining signals if thresholds are exceeded

Future Development

This is a beta version intended for learning and experimentation. Future versions will include:

  • Support for multi-class classification and regression

  • GPU-accelerated training

  • REST API and streaming data support

  • Advanced hyperparameter optimization

  • Real-time monitoring and drift detection

  • Deployment-ready containers and cloud integration

License

MIT License

About

A beta project of automating training and inference for models with constant monitoring, the beta only supports binary classification models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages