End-to-End Machine Learning Lifecycle (Beta)

Overview

This repository contains a beta version of an End-to-End Machine Learning Lifecycle pipeline, designed as an introduction to MLOps and production-grade machine learning workflows.

The project focuses on binary classification and is intended as a learning platform and starting point for more advanced projects in the future. The ultimate goal of this pipeline is to streamline and automate the full machine learning lifecycle: from training and evaluating models to serving predictions and managing retraining.

Features

Ingest and preprocess datasets
Split data into training and test sets
Train and evaluate models
Track experiments and model versions
Serve predictions with metadata logging
Trigger retraining based on configurable thresholds
Orchestrate all steps with a modular workflow
Docker support for reproducibility

Folder Structure

mlops_beta/
├── data/
│   ├── raw/                   # Original datasets
│   └── processed/             # Preprocessed datasets
├── artifacts/
│   ├── models/                # Trained model artifacts
│   │   └── <hash_or_version>/
│   ├── preprocessing/         # Serialized preprocessing pipelines
│   └── predictions/           # Output predictions with metadata
├── experiments/
│   ├── <experiment_id>.yaml   # Experiment configuration
│   └── metadata.sqlite        # Experiment metadata storage
├── configs/
│   ├── data_config.yaml       # Dataset and preprocessing settings
│   ├── model_config.yaml      # Model architecture and hyperparameters
│   └── inference_config.yaml  # Serving and inference settings
├── dags/
│   └── ml_pipeline.py         # Pipeline orchestration
├── src/
│   ├── preprocessing.py       # Preprocessing functions
│   ├── training.py            # Model training logic
│   ├── inference.py           # Serving / inference logic
│   └── utils.py               # Utility functions
├── requirements.txt           # Python dependencies
├── environment.yaml           # Conda environment for reproducibility
├── Dockerfile                 # Optional Docker setup
└── README.md                  # This file

Getting Started

1. Environment Setup

# Option 1: Using pip
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Option 2: Using conda
conda env create -f environment.yaml
conda activate mlops_beta

# Optional: Docker for full reproducibility
docker build -t mlops_beta:latest .
docker run -it --rm mlops_beta:latest

2. Running the Pipeline

Place your raw dataset in data/raw/.
Adjust settings in configs/data_config.yaml and configs/model_config.yaml.
Run the pipeline:

python dags/ml_pipeline.py

The pipeline will:

Preprocess data and split into train/test sets
Train and evaluate the model
Save artifacts and experiment metadata
Serve predictions with logging
Generate retraining signals if thresholds are exceeded

Future Development

This is a beta version intended for learning and experimentation. Future versions will include:

Support for multi-class classification and regression
GPU-accelerated training
REST API and streaming data support
Advanced hyperparameter optimization
Real-time monitoring and drift detection
Deployment-ready containers and cloud integration

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Machine Learning Lifecycle (Beta)

Overview

Features

Folder Structure

Getting Started

1. Environment Setup

2. Running the Pipeline

Future Development

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
dags		dags
src		src
Dockerfile		Dockerfile
README.md		README.md
environment.yaml		environment.yaml
requirements.txt		requirements.txt

Sycritz/e2e-ml-lifecycle-beta

Folders and files

Latest commit

History

Repository files navigation

End-to-End Machine Learning Lifecycle (Beta)

Overview

Features

Folder Structure

Getting Started

1. Environment Setup

2. Running the Pipeline

Future Development

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages