This repository contains the source code, documentation, and deliverables for a Video Game & Reviews ML/AI project. The goal is to process and analyze video game data and reviews. It performs sentiment classification, clusters game genres, and generates summaries to recommend top games.
-
๐ Sentiment Analysis:
- Classifies reviews as Positive, Neutral, or Negative.
- Fine-tuned DistilBERT ensures high accuracy tailored to the dataset.
-
๐๏ธ Category Clustering:
- Groups games into broader genre categories, such as Combat-Focused Gameplay, etc..
- Enables better data organization and visualization.
-
๐ Review Summarization:
- Generates blog-like articles summarizing game features.
- Highlights the top three games per cluster and reasons why people like/dislike them.
-
๐ Interactive Website:
- Presents all analyses in an intuitive, user-friendly interface.
- Allows live sentiment processing of user review texts.
- Primary Dataset: Custom video game review dataset.
root/
โโโ core/ # Core Django app
โ โโโ templates/ # HTML index view
โโโ data/ # Raw and processed datasets
โโโ models/ # Saved and fine-tuned models
โ โโโ clustering/ # pyLDAvis visualization
โ โโโ sentiment/ # DistilBERT files
โโโ notebooks/ # Jupyter notebooks for model development
โโโ scripts/ # Python scripts for feeding the database
โโโ served_model/ # Flask app serving TinyBERT
โโโ static/ # Static files (CSS, JS, images)
โโโ db.sqlite3 # Django SQLite database
โโโ manage.py # Django CLI utility script
โโโ README.md # Project documentation
โโโ requirements.txt # Python dependencies
-
๐งน Preprocessing:
- Text Cleaning: Removes special characters and standardizes text.
- Data Cleaning: Drops unnecessary columns and handles missing values.
- Enrichment: Adds genres via the OpenAI API.
- Balancing: Applies upsampling and downsampling.
- Normalization: Performs lemmatization and stopword removal.
- Tokenization & Vectorization: Prepares text for modeling.
-
๐ Model Pipeline:
- Sentiment Classification: Uses fine-tuned DistilBERT for sentiment analysis.
- Topic Modeling: Employs LDA to uncover hidden topics and group similar game genres.
- Summarization: Utilizes the OpenAI API for concise summaries of game pros and cons.
-
๐ Evaluation:
- Metrics: Evaluates model performance using accuracy, precision, recall, and F1-score.
- Visualization: Includes confusion matrix, word clouds, and t-SNE plot.
- Analysis: Displays example predictions and enables interactive topic exploration with pyLDAvis.
-
๐ Deployment:
- Web Interface: Built with Django.
- Model Serving: Sentiment model served via Flask.
- Hosting: Entire application hosted on Heroku.
-
๐ Source Code:
- Organized Python scripts and Jupyter notebooks.
-
๐ Website:
- Live demo hosted (RoboReviews).
-
๐ Evaluation Metrics:
- Visualizations: Plots (images/notebooks) and LDA visualization rendered as HTML.
- ๐ฌ Sentiment Predictions: Users can test written texts for sentiment.
- ๐ Review Analysis: View categorized and summarized results.
- Extend datasets for broader coverage.
- Fine-tune and host LLM for game summarization.
- ๐ Datasets from UCSD.
- ๐ ๏ธ Pretrained models from Hugging Face.