=============================================
- Project Overview
- Architecture
- Installation
- Usage
- Project Structure
- Deployment on Hugging Face Spaces
- Configuration
- Notes
- Known Issues
- Author
This project implements a complete MLOps pipeline to analyze the sentiment of YouTube comments. It includes:
- A Machine Learning model (Logistic Regression with TF-IDF) trained on Reddit data
- A FastAPI backend API to serve the model
- A Chrome extension for the user interface
This project demonstrates how to put an ML model into production as part of an MLOps course.
The project is organized into several components:
- Machine Learning Engine (
src/models/): Sentiment classification model (Logistic Regression + TF-IDF) - Data Processing (
src/data/): Scripts to download and process data - Backend API (
src/api/): FastAPI with/predict_batchendpoint - Chrome Extension (
chrome-extension/): Extension to extract and analyze YouTube comments - Deployment: Dockerized application ready for Hugging Face Spaces
- Python 3.10 or higher
- Google Chrome
- Git
-
Clone the repository:
git clone https://github.com/TALEB7/YouTube-Sentiment-Analysis.git cd YouTube-Sentiment-Analysis -
Create a virtual environment:
python -m venv venv # On Windows: venv\Scripts\activate # On Linux/Mac: source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Initialize project structure (optional):
python setup.py
-
Download and prepare data:
python src/data/download_data.py python src/data/process_data.py
-
Train the model:
python src/models/train_model.py
⚠️ Note: Training may take several minutes depending on your machine. -
Run the API:
python -m uvicorn src.api.main:app --reload
The API will be accessible at
http://127.0.0.1:8000
- Open Chrome and go to
chrome://extensions/ - Enable "Developer mode" (top right)
- Click "Load unpacked extension"
- Select the
chrome-extensionfolder from this project
- Make sure the API is running (locally or on Hugging Face)
- Go to a YouTube video page
- Scroll to load comments
- Click the extension icon
- Click "Analyze Comments"
- View the sentiment distribution and analysis of each comment
You can test the API with the provided script:
python test_api.pyOr use curl:
curl -X POST "http://127.0.0.1:8000/predict_batch" \
-H "Content-Type: application/json" \
-d '{"comments": [{"id": "1", "text": "This video is great!"}]}'YouTube-Sentiment-Analysis/
├── src/
│ ├── api/ # FastAPI
│ ├── data/ # Data processing scripts
│ └── models/ # Training scripts
├── chrome-extension/ # Chrome extension
├── data/ # Data (raw and processed)
├── models/ # Trained models
├── app.py # Entry point for Hugging Face
├── Dockerfile # Docker configuration
├── requirements.txt # Python dependencies
└── README.md # This fileTo deploy on Hugging Face Spaces:
- Create a new Space (SDK: Docker)
- Upload the repository contents
⚠️ Important: Include themodels/sentiment_model.joblibfile (or train it during build)
- The Space will build and launch the API automatically
- Update the URL in
chrome-extension/popup.js:const apiUrl = "https://your-space.hf.space/predict_batch";
Main parameters are in config.py. You can modify:
- File paths
- Model parameters
- API URLs
- The model is trained on Reddit data, so it may not be perfect for YouTube
- The Chrome extension may need adjustments if YouTube changes its HTML structure
- For better performance, we could use a pre-trained model (BERT, etc.)
- If the extension doesn't find comments, make sure you've scrolled to load them
- The model may take a few seconds to load when starting the API
- On Hugging Face, make sure the model is included in the build
- FARDAOUI Ilyas