Welcome to the Wine Quality Prediction project! This repository contains an end-to-end machine learning pipeline built using the Wine Quality Dataset. The project leverages tools like MLflow, DagsHub, and Flask to create a robust and interactive solution for predicting wine quality.
- Project Overview
- Features
- Tech Stack
- Project Workflow
- Folder Structure
- Setup Instructions
- Usage
- License
This project demonstrates the complete lifecycle of a machine learning solution:
- Data Ingestion: Collect and preprocess the Wine Quality dataset.
- Model Training: Train machine learning models to predict wine quality.
- Model Evaluation: Evaluate the model's performance using metrics and track experiments with MLflow and DagsHub.
- Deployment: Deploy the model using Flask to provide a simple and interactive user interface.
- End-to-End ML Pipeline: From data ingestion to deployment.
- Experiment Tracking: Integrated with MLflow and DagsHub for tracking experiments and model performance.
- Interactive UI: A simple Flask-based web interface for users to input data and get predictions.
- Configurable Workflow: YAML-based configuration for easy customization.
- Programming Language: Python
- Libraries: Pandas, NumPy, Scikit-learn, Flask
- Experiment Tracking: MLflow, DagsHub
- Deployment: Flask
- Version Control: Git and GitHub
- Containerization: Docker
The project is divided into the following stages:
- Data Ingestion: Load and preprocess the dataset.
- Data Validation: Validate the dataset schema using
schema.yaml. - Data Transformation: Perform feature engineering and preprocessing.
- Model Training: Train and save the model.
- Model Evaluation: Evaluate the model and log metrics using MLflow.
- Deployment: Deploy the model using Flask for real-time predictions.
- Update the following configuration files:
config.yamlschema.yamlparams.yaml
- Implement the pipeline components in the
srcdirectory. - Run the pipeline using
main.py.
DsProject/
├── .github/workflows/ # CI/CD workflows
├── config/ # Configuration files (config.yaml, schema.yaml, params.yaml)
├── research/ # Notebooks for exploratory data analysis
├── src/datascience/ # Source code for ML pipeline
├── templates/ # HTML templates for Flask UI
├── app.py # Flask application
├── main.py # Entry point for the ML pipeline
├── requirements.txt # Python dependencies
├── Dockerfile # Docker configuration
├── setup.py # Package setup
└── README.md # Project documentation
- Clone the repository:
git clone https://github.com/adiManethia/DsProject.git cd DsProject - Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate - Install dependencies:
pip install -r requirements.txt - Run the pipeline:
python main.py - Start the Flask app:
python app.py
- Experiment Tracking : Use MLflow and DagsHub to monitor model performance.
- Web Interface : Input wine features through the Flask UI to get quality predictions.
This project is licensed under the GPL-3.0 License.
Feel free to contribute to this project by submitting issues or pull requests.