Skip to content

A personal learning project showcasing end-to-end AutoML pipeline with interactive UI and distributed training.

License

Notifications You must be signed in to change notification settings

SubhojitGhimire/autoML

Repository files navigation

AutoML Playground

Python License

An interactive web application for automated machine learning, built with Streamlit. This tool allows users to upload their data, perform feature engineering, and train models for various ML tasks, including Time Series Forecasting, Regression, Classification, and Outlier Detection. It leverages Ray for distributed, parallel training to accelerate the model building process.

Image: Upload Data

Key Features

  • Interactive UI: A multi-page Streamlit application that guides the user from data upload to model training.

  • Multiple ML Tasks: Supports a wide range of tasks:

    • Time Series Forecasting
    • Regression
    • Classification
    • Outlier/Anomaly Detection
  • Diverse Algorithm Selection: Implements popular and powerful algorithms for each task.

    Machine Learning Task Supported Algorithms
    Time-Series Forecasting ARIMA, SARIMA, Neural Prophet, Attention-Based LSTM, XGBoost Forecaster
    Regression XGBoost Regressor, Random Forest Regressor, Linear Regression, SVM Regressor (SVR)
    Classification XGBoost Classifier, Logistic Regression, Linear SVM Classifier (LinearSVC)
    Outlier Detection CatBoost Outlier Detector, Local Outlier Factor (LOF), One-Class SVM
  • Hyperparameter Tuning:

    • Auto-tuning: Utilizes Bayesian Optimization (scikit-optimize) to automatically find the best hyperparameters.
    • Manual Tuning: Provides an interface for users to manually set and experiment with algorithm parameters.
  • Distributed Computing: Uses the Ray framework to distribute model training across multiple processes or nodes, perfect for large datasets or partitioned data.

  • Comprehensive Preprocessing: Offers UI-based controls for:

    • Missing value imputation
    • Feature scaling (Standardization & Normalization)
    • Data sorting and randomization
    • Datetime format conversion
  • Real-time Progress Tracking: A dedicated "Progress History" page to monitor active training jobs and review completed ones.

  • Data Partitioning: Ability to train models in parallel on different segments of the data (e.g., for different stores or products).

Requirements and Usage

Clone the repository. From inside the repo folder, install the dependencies:

python -m pip install --upgrade -r .\requirements.txt

Run the Streamlit application:

streamlit run .\main.py

The application should now be open and accessible in your web browser. The application is designed to be intuitive, guiding you through a series of steps in the sidebar.

  1. Upload Data: Start by uploading your dataset (CSV or Excel) or providing a local file path.
  2. Data Overview: Get a summary of your data, including shapes, data types, and a correlation matrix to understand feature relationships.
  3. Feature Engineering:
    • Select your feature, label, and (optional) datetime and partition columns.
    • Apply preprocessing steps like handling missing values, scaling features, or sorting data.
  4. Model Configuration:
    • Choose the ML task you want to perform.
    • Select an algorithm from the list of supported models.
    • Choose your hyperparameter tuning strategy ("Auto" for bayesian hyperparameter tuning or "Manual" for user-specific values).
    • Define your train/test split strategy (percentage-based or time-based).
  5. Summary: Review all your configurations in one place. You can download the configuration as a JSON file for reproducibility.
  6. Run AutoML Training: Click the button to start the training process. The job will be sent to the Ray backend for execution.
  7. Progress History: Navigate to this page to see your model's training progress in real-time or view the results of completed jobs. You can download the prediction outputs from here.

Screenshots

Image: Data Overview
Image: Feature Engineering
Image: Model Configuration
Image: Configuration Summary
Image: Progress History Active Model
Image: Progress History Completed Model
Image: Terminal

*This README.md file has been improved for overall readability (grammar, sentence structure, and organization) using AI tools.

About

A personal learning project showcasing end-to-end AutoML pipeline with interactive UI and distributed training.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published