Skip to content

Bootcamp-IA-P4/nlp-team2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

139 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Modzilla: API for comment moderation on YouTube

Descripción de la imagen

📚 Table of Contents

  1. 🧾 Overview
  2. 🎯 System Purpose and Scope
  3. 🔍 Main Features
  4. 💡 Possible Improvements
  5. 📁 Folder Structure
  6. 🏗️ Architecture Diagram
  7. ⚙️ Installation and Usage
  8. 🚀 Deployment
  9. 🧑‍💻 Collaborators

🧾 Overview

This document provides a technical overview of the nlp-team2 repository, a machine learning system designed for classifying comments of youtube videos as toxic or not toxic based on NLP (Natural Language Processing) tecnology. The repository implements a complete pipeline from data exploration to model deployment, focusing on ensuring high accuracy and recall for toxic comments detection.

🎯 System Purpose and Scope

The nlp-team2 repository serves as a comprehensive mushroom classification system that:

  1. Analyzes relationships between youtube comments characteristics and toxici
  2. Trains and evaluates multiple classification models
  3. Deploys the best-performing model for inference
  4. Provides a structured input validation system to ensure reliable predictions
  5. Deploys a functional, intuitive API with great analysis capacity.

🔍 Main Features

✅ Complete EDA with visualizations to understand variable relationships.

✅ Trained Transformer model to predict whenever a comment is toxic or not

✅ Connecting to the API

✅ Connecting the backend with the frontend

✅ Implement solution using CSS, HTML, and Vanilla JavaScript

✅ Create connection with the database

✅ Dockerized version of the program.

✅ Model in production

💡 Possible Improvements

⏩ Analysis of other social networks

⏩ Higher speed in loading results

⏩ Experiments or deployments with neural network models.

📁 Folder Structure

📂 Mushroom-Classifier/

nlp-team2/
├── .github/
│   └── pull_request_template.md
├── .gitignore
├── .qodo/
│   └── testConfig.toml
├── client/
│   ├── .gitignore
│   ├── Dockerfile
│   ├── README.md
│   ├── eslint.config.js
│   ├── index.html
│   ├── package-lock.json
│   ├── package.json
│   ├── postcss.config.js
│   ├── public/
│   │   ├── img/
│   │   │   ├── gifgodzilla.gif
│   │   │   ├── icongojira.png
│   │   │   ├── logoFull.png
│   │   │   ├── logoPet.png
│   │   │   └── logofont.png
│   │   ├── index.html
│   │   └── vite.svg
│   ├── src/
│   │   ├── App.css
│   │   ├── App.jsx
│   │   ├── Main.jsx
│   │   ├── assets/
│   │   │   └── react.svg
│   │   ├── components/
│   │   │   ├── AnalyzeTab.jsx
│   │   │   ├── AppMetadata.jsx
│   │   │   ├── Dashboard.jsx
│   │   │   ├── GuideTab.jsx
│   │   │   ├── Header.jsx
│   │   │   ├── HistoryTab.jsx
│   │   │   ├── ProgressLoader.jsx
│   │   │   ├── Sidebar.jsx
│   │   │   └── ToxicityBadge.jsx
│   │   ├── contexts/
│   │   │   └── ThemeContext.jsx
│   │   ├── hooks/
│   │   │   ├── useApiData.js
│   │   │   └── useAppInfo.js
│   │   ├── index.css
│   │   └── utils/
│   │       └── mockData.js
│   ├── tailwind.config.js
│   ├── tests/
│   │   ├── App.test.jsx
│   │   ├── README.md
│   │   ├── basic-component.test.jsx
│   │   ├── basic.test.js
│   │   ├── components/
│   │   │   ├── Dashboard.test.jsx
│   │   │   ├── Header.test.jsx
│   │   │   ├── Sidebar.test.jsx
│   │   │   └── ThemeContext.test.jsx
│   │   └── setup/
│   │       ├── apiMocks.js
│   │       └── setupTests.js
│   │   └── utils/
│   │       └── mockData.test.js
│   ├── vite.config.js
│   └── vitest.config.js
├── docker-compose.yml
├── eda/
│   └── eda_nlp_(1).ipynb
├── mlFlow/
│   ├── NOTES.md
│   ├── README.md
│   ├── data/
│   │   └── raw/
│   │       ├── hatespeech.csv
│   │       ├── homophobia_human_like_1000.csv
│   │       └── youtoxic_enriched_full.csv
│   ├── enviroment.yml
│   ├── experiments/
│   │   ├── demo_feature_engineering.py
│   │   ├── mlflow_experiments.py
│   │   └── test_transformer_simple.py
│   ├── notebooks/
│   │   ├── eda-nlp.ipynb
│   │   ├── mlflow-experiments.ipynb
│   │   └── model-comparison.ipynb
│   ├── requirements.txt
│   ├── scripts/
│   │   ├── check_experiments.py
│   │   ├── save_current_model.py
│   │   ├── test_backup_model.py
│   │   └── test_model_predictions.py
│   └── src/
│       ├── __init__.py
│       ├── data_preprocessing.py
│       ├── feature_engineering.py
│       ├── model_utils.py
│       └── transformer_models_clean.py
├── requirements.txt
├── server/
│   ├── Dockerfile
│   ├── __main__.py
│   ├── core/
│   │   ├── LOGGING_GUIDE.md
│   │   ├── __init__.py
│   │   ├── config.py
│   │   └── print_dev.py
│   ├── database/
│   │   ├── __init__.py
│   │   ├── db_manager.py
│   │   └── models.py
│   ├── main.py
│   ├── ml/
│   │   ├── __init__.py
│   │   ├── api/
│   │   │   ├── __init__.py
│   │   │   └── toxicity_routes.py
│   │   ├── pipeline.py
│   │   └── predictor.py
│   ├── scraper/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── progress_manager.py
│   │   ├── scrp.py
│   │   └── scrp_socket.py
│   └── tests/
│       ├── README.md
│       ├── conftest.py
│       ├── pytest.ini
│       ├── run_coverage.ps1
│       ├── run_coverage.sh
│       ├── test_database.py
│       ├── test_main.py
│       ├── test_print_dev.py
│       └── test_scrp.py

🏗️ Architecture Diagram

Descripción de la imagen

⚙️ Installation and Usage

1️⃣ Clone the repository

git clone [https://github.com/Bootcamp-IA-P4/nlp-team2)
cd nlp-team2

2️⃣ Create and activate the virtual environment

python -m venv .venv
source .venv/bin/activate   # On Linux/MacOS
.venv\Scripts\activate     # On Windows

3️⃣ Install dependencies

pip install -r requirements.txt

4️⃣ Run app

uvicorn server.main:app --reload

5️⃣ Run test

Readme Frontend tests

Readme Backend tests

🚀 Deployment

  • The model can be tested on render, with the following links:

    Frontend

    Backend

🧑‍💻 Collaborators

This project was developed by the following contributors:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors