Skip to content

dev-opsss/MLOps-CI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MLOps CI/CD with TensorFlow and CML πŸ€–

MLOps CI

A complete MLOps pipeline demonstrating automated machine learning model training, evaluation, and reporting using TensorFlow, GitHub Actions, and Continuous Machine Learning (CML). This project showcases best practices for ML automation, model performance tracking, and reproducible machine learning workflows.

🎯 Project Overview

This repository implements an end-to-end machine learning pipeline that:

  • Automatically trains a TensorFlow neural network on synthetic linear data
  • Evaluates model performance using comprehensive metrics
  • Generates visual reports with training results and predictions
  • Creates automated reports via CML comments on GitHub PRs
  • Ensures reproducibility through version-controlled ML workflows

Key Features

  • πŸ”„ Automated CI/CD Pipeline: Triggered on every push/PR
  • πŸ“Š Performance Tracking: MAE, MSE, RΒ² score monitoring
  • πŸ“ˆ Visual Analytics: Automated plot generation and publishing
  • πŸš€ Production Ready: Near-perfect model performance (RΒ² β‰ˆ 1.0)
  • πŸ“‹ Comprehensive Reporting: Detailed model configuration and metrics

πŸ—οΈ Project Structure

β”œβ”€β”€ .github/workflows/
β”‚   └── cml.yml                 # GitHub Actions CI/CD workflow
β”œβ”€β”€ model.py                    # Main ML training script  
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ README.md                   # Project documentation
β”œβ”€β”€ metrics.txt                 # Generated model performance metrics
└── model_results.png          # Generated visualization plot

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • GitHub repository with Actions enabled
  • Basic understanding of TensorFlow and MLOps

Setup Instructions

  1. Clone the repository:

    git clone https://github.com/dev-opsss/MLOps-CI.git
    cd MLOps-CI
  2. Install dependencies:

    pip install -r requirements.txt
  3. Run locally (optional):

    python model.py
  4. Enable GitHub Actions:

    • Push to your repository to trigger the automated pipeline
    • Check the Actions tab for workflow execution
    • View CML reports in PR comments

πŸ€– Model Architecture

Neural Network Design

model = tf.keras.Sequential([
    tf.keras.layers.Dense(1, input_shape=(1,))  # Single layer for linear regression
])

Key Specifications

  • Framework: TensorFlow 2.20+
  • Model Type: Sequential Neural Network
  • Architecture: Single Dense Layer (Linear Regression)
  • Optimizer: Adam (learning_rate=0.1)
  • Loss Function: Mean Squared Error
  • Training Epochs: 200
  • Data Normalization: StandardScaler applied

πŸ“Š Dataset Details

Synthetic Linear Data

  • Relationship: y = x + 10
  • Total Samples: 50
  • Feature Range: X ∈ [-100, 96] (step=4)
  • Target Range: y ∈ [-90, 106] (step=4)
  • Train/Test Split: 70/30 (shuffled)
  • Validation Split: 20% of training data

Data Preprocessing

  • Random shuffling to prevent extrapolation issues
  • Feature standardization for stable training
  • Proper tensor reshaping for TensorFlow compatibility

🎯 Performance Metrics

Exceptional Results Achieved

Mean Absolute Error = 0.000709
Mean Squared Error = 0.000001  
RΒ² Score = 1.000000
Final Training Loss = 4.62e-10
Final Validation Loss = 1.89e-10

Model Performance Indicators

  • βœ… Near-perfect accuracy (MAE < 0.001)
  • βœ… Perfect correlation (RΒ² = 1.0)
  • βœ… No overfitting (validation loss β‰ˆ training loss)
  • βœ… Production ready performance levels

πŸ”„ CI/CD Pipeline

GitHub Actions Workflow

The automated pipeline (/.github/workflows/cml.yml) performs:

  1. Environment Setup

    • Ubuntu latest runner
    • Python dependencies installation
    • CML tools configuration
  2. Model Training

    • Execute model.py script
    • Generate performance metrics
    • Create visualization plots
  3. Report Generation

    • Publish model results visualization
    • Create comprehensive performance report
    • Post automated comments on PRs

Workflow Triggers

  • Push events: Any commit to main branch
  • Pull requests: Automatic model evaluation on PRs
  • Manual dispatch: On-demand workflow execution

πŸ“ˆ Visualization & Reporting

Automated Plots

The pipeline generates publication-ready visualizations showing:

  • Training data points (blue scatter)
  • Test data points (green scatter)
  • Model predictions (red scatter)
  • True relationship line (black dashed)
  • Performance metrics overlay

CML Reports

Automated GitHub comments include:

  • πŸ“Š Model Performance Metrics
  • πŸ“ˆ Training Result Visualizations
  • πŸ”§ Model Configuration Details
  • πŸ“‹ Training Process Summary
  • 🎯 Results Analysis & Status

πŸ› οΈ Configuration

Requirements (requirements.txt)

tensorflow>=2.20.0
numpy>=2.3.0
matplotlib>=3.10.0

Key Model Parameters

# Training Configuration
EPOCHS = 200
LEARNING_RATE = 0.1
BATCH_SIZE = 35  # Full batch training
VALIDATION_SPLIT = 0.2
TRAIN_TEST_SPLIT = 0.7

# Data Configuration  
RANDOM_SEED = 42
FEATURE_RANGE = (-100, 96)
STEP_SIZE = 4

πŸ”§ Development Guidelines

Running Locally

# Install dependencies
pip install -r requirements.txt

# Execute training script
python model.py

# View generated files
ls -la *.png *.txt

Modifying the Model

To experiment with different architectures:

# Example: Multi-layer network
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(1,)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1)
])

Custom Datasets

To use your own data:

# Replace synthetic data generation
X = your_features.reshape(-1, 1)
y = your_targets.reshape(-1, 1)

πŸ“‹ Troubleshooting

Common Issues & Solutions

1. GitHub Actions Permission Errors

# Add to workflow permissions
permissions:
  contents: read
  pull-requests: write
  issues: write

2. TensorFlow Version Compatibility

# Ensure compatible versions
pip install tensorflow>=2.20.0

3. CML Report Generation Issues

# Use heredoc syntax for complex reports
cat << 'EOF' >> report.md
# Your markdown content
EOF

πŸ† Project Achievements

Technical Accomplishments

  • βœ… Perfect Model Performance: RΒ² = 1.000000
  • βœ… Automated MLOps Pipeline: End-to-end automation
  • βœ… Comprehensive Testing: Training & validation monitoring
  • βœ… Production Readiness: Sub-millimeter precision
  • βœ… Reproducible Workflows: Version-controlled ML pipeline

Best Practices Implemented

  • Data normalization for training stability
  • Proper train/test splitting with shuffling
  • Comprehensive metrics tracking (MAE, MSE, RΒ²)
  • Automated visualization generation
  • CI/CD integration with GitHub Actions
  • Version control for ML experiments

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/improvement)
  3. Commit your changes (git commit -am 'Add improvement')
  4. Push to the branch (git push origin feature/improvement)
  5. Create a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • TensorFlow Team for the excellent ML framework
  • Iterative.ai for CML (Continuous Machine Learning)
  • GitHub for Actions CI/CD platform
  • Open Source Community for inspiration and best practices

πŸ“Š Latest Results

Last Updated: Automatically updated by CML workflow

For the most recent model performance and visualizations, check the latest GitHub Actions run or PR comments.


This project demonstrates production-ready MLOps practices with automated model training, evaluation, and reporting. Perfect for learning CI/CD for machine learning workflows! πŸš€

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages