Skip to content

Danielkis97/Sentiment-Analysis-of-Movie-Reviews-Using-Machine-Learning

Repository files navigation

Python NLTK Scikit-Learn Issues

Sentiment Analysis of Movie Reviews Using Machine Learning

Overview

This repository contains a machine learning project that performs sentiment analysis on IMDB movie reviews. The project classifies reviews as either positive or negative using a Naive Bayes model.

UML Diagram

The following diagram outlines the key steps in the machine learning pipeline:

class_diagram

Main Features

  • Data Preprocessing: Converts reviews into a format suitable for model training by applying tokenization, stopword removal, and lemmatization.
  • Model: Naive Bayes classifier trained with hyperparameter tuning.
  • Evaluation: Model evaluated on small and large datasets, achieving an accuracy of over 84%.
  • Prediction: Predicts sentiment for new movie reviews.

Setup and Installation

Important

Follow these instructions to set up and run the sentiment analysis project.

Prerequisites

  • Python 3.x installed on your local machine.
  • Libraries listed in requirements.txt.

Installation Steps

  1. Clone the repository:

    git clone https://github.com/Danielkis97/Sentiment-Analysis-of-Movie-Reviews-Using-Machine-Learning.git
    cd Sentiment-Analysis-of-Movie-Reviews-Using-Machine-Learning
  2. Set up a virtual environment (optional but recommended):

    • On Windows:

      python -m venv venv
      venv\Scripts\activate
    • On macOS/Linux:

      python3 -m venv venv
      source venv/bin/activate
  3. Install dependencies:

    pip install -r requirements.txt
    
    
  4. Download the dataset: The dataset used for this project is the IMDB Large Movie Review Dataset. Download it from here. After downloading, extract it into the project directory where you've placed the other NLP project files. The resulting directory structure should look like this:

project_directory/
├── train_model.py
├── predict_sentiment.py
├── data_preprocessing.py
├── load_data_big.py
├── load_data_small.py
├── requirements.txt
├── data/
│   ├── aclImdb/
│   │   ├── train/
│   │   │   ├── pos/
│   │   │   ├── neg/
│   │   ├── test/
│   │   │   ├── pos/
│   │   │   ├── neg/

  1. Load the Data: Before training the model, you need to load the dataset. Run one of the following commands depending on whether you want to train with a large or small dataset:
    python load_data_big.py  # For large dataset
    python load_data_small.py  # For small dataset
  1. Train the Model: To train the model, run:
     python train_model.py
  2. Predict Sentiment: To predict the sentiment of a new review, run:
        python predict_sentiment.py

Directory Structure

  • train_model.py: Script for training the sentiment analysis model.
  • predict_sentiment.py: Script for predicting the sentiment of new movie reviews.
  • data_preprocessing.py: Script for preprocessing movie reviews (tokenization, stopword removal, lemmatization).
  • load_data_big.py: Script for loading the large dataset of movie reviews.
  • load_data_small.py: Script for loading the small dataset of movie reviews.
  • latest_model.pkl: Trained Naive Bayes model.
  • latest_vectorizer.pkl: Vectorizer for transforming text into numerical features.
  • requirements.txt: List of Python dependencies required for the project.
  • RESULTS.md: File containing detailed evaluation results for small and large datasets.

Possible Bugs and Solutions

  • Data Loading Errors:

    • Scenario: Issues with loading data or incorrect paths.
    • Solution: Ensure the dataset is in the correct directory and paths are correctly specified in the scripts.
  • Model Performance Issues:

    • Scenario: Lower-than-expected accuracy or incorrect predictions.
    • Solution: Check data preprocessing steps and consider experimenting with different models or hyperparameters.

Evaluation Results

Detailed evaluation results, including confusion matrices and performance metrics for both small and large datasets, can be found in the Evaluation Results

Development Environment

The code for this project was developed using PyCharm, which offers a powerful IDE for Python development.

Happy Testing! 🚀

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages