Sentiment Analysis of Movie Reviews Using Machine Learning

Overview

This repository contains a machine learning project that performs sentiment analysis on IMDB movie reviews. The project classifies reviews as either positive or negative using a Naive Bayes model.

UML Diagram

The following diagram outlines the key steps in the machine learning pipeline:

Main Features

Data Preprocessing: Converts reviews into a format suitable for model training by applying tokenization, stopword removal, and lemmatization.
Model: Naive Bayes classifier trained with hyperparameter tuning.
Evaluation: Model evaluated on small and large datasets, achieving an accuracy of over 84%.
Prediction: Predicts sentiment for new movie reviews.

Setup and Installation

Important

Follow these instructions to set up and run the sentiment analysis project.

Prerequisites

Python 3.x installed on your local machine.
Libraries listed in requirements.txt.

Installation Steps

Clone the repository:

git clone https://github.com/Danielkis97/Sentiment-Analysis-of-Movie-Reviews-Using-Machine-Learning.git
cd Sentiment-Analysis-of-Movie-Reviews-Using-Machine-Learning

Set up a virtual environment (optional but recommended):

On Windows:

python -m venv venv
venv\Scripts\activate

On macOS/Linux:

python3 -m venv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Download the dataset: The dataset used for this project is the IMDB Large Movie Review Dataset. Download it from here. After downloading, extract it into the project directory where you've placed the other NLP project files. The resulting directory structure should look like this:

project_directory/
├── train_model.py
├── predict_sentiment.py
├── data_preprocessing.py
├── load_data_big.py
├── load_data_small.py
├── requirements.txt
├── data/
│   ├── aclImdb/
│   │   ├── train/
│   │   │   ├── pos/
│   │   │   ├── neg/
│   │   ├── test/
│   │   │   ├── pos/
│   │   │   ├── neg/

Load the Data: Before training the model, you need to load the dataset. Run one of the following commands depending on whether you want to train with a large or small dataset:

    python load_data_big.py  # For large dataset
    python load_data_small.py  # For small dataset

Train the Model: To train the model, run:
```
 python train_model.py
```
Predict Sentiment: To predict the sentiment of a new review, run:
```
    python predict_sentiment.py
```

Directory Structure

train_model.py: Script for training the sentiment analysis model.
predict_sentiment.py: Script for predicting the sentiment of new movie reviews.
data_preprocessing.py: Script for preprocessing movie reviews (tokenization, stopword removal, lemmatization).
load_data_big.py: Script for loading the large dataset of movie reviews.
load_data_small.py: Script for loading the small dataset of movie reviews.
latest_model.pkl: Trained Naive Bayes model.
latest_vectorizer.pkl: Vectorizer for transforming text into numerical features.
requirements.txt: List of Python dependencies required for the project.
RESULTS.md: File containing detailed evaluation results for small and large datasets.

Possible Bugs and Solutions

Data Loading Errors:
- Scenario: Issues with loading data or incorrect paths.
- Solution: Ensure the dataset is in the correct directory and paths are correctly specified in the scripts.
Model Performance Issues:
- Scenario: Lower-than-expected accuracy or incorrect predictions.
- Solution: Check data preprocessing steps and consider experimenting with different models or hyperparameters.

Evaluation Results

Detailed evaluation results, including confusion matrices and performance metrics for both small and large datasets, can be found in the Evaluation Results

Development Environment

The code for this project was developed using PyCharm, which offers a powerful IDE for Python development.

Happy Testing! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
NLP Project - UML.png		NLP Project - UML.png
README.md		README.md
data_preprocessing.py		data_preprocessing.py
latest_model.pkl		latest_model.pkl
latest_test_data.pkl		latest_test_data.pkl
latest_vectorizer.pkl		latest_vectorizer.pkl
load_data_big.py		load_data_big.py
load_data_small.py		load_data_small.py
naive_bayes_model_large.pkl		naive_bayes_model_large.pkl
naive_bayes_model_small.pkl		naive_bayes_model_small.pkl
predict_sentiment.py		predict_sentiment.py
requirements.txt		requirements.txt
results.md		results.md
tfidf_vectorizer_large.pkl		tfidf_vectorizer_large.pkl
tfidf_vectorizer_small.pkl		tfidf_vectorizer_small.pkl
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sentiment Analysis of Movie Reviews Using Machine Learning

Overview

UML Diagram

Main Features

Setup and Installation

Prerequisites

Installation Steps

Directory Structure

Possible Bugs and Solutions

Evaluation Results

Development Environment

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Danielkis97/Sentiment-Analysis-of-Movie-Reviews-Using-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis of Movie Reviews Using Machine Learning

Overview

UML Diagram

Main Features

Setup and Installation

Prerequisites

Installation Steps

Directory Structure

Possible Bugs and Solutions

Evaluation Results

Development Environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages