This project performs sentiment analysis on both text and audio inputs using Natural Language Processing and Machine Learning techniques. The system analyzes customer reviews from the Amazon Alexa dataset and classifies sentiment into Positive, Negative, or Neutral.
For audio input, the system converts speech into text using Speech Recognition, extracts MFCC (Mel Frequency Cepstral Coefficients) features from audio signals, and then predicts sentiment using trained machine learning models.
The project demonstrates an end-to-end sentiment analysis pipeline including preprocessing, feature extraction, model training, and deployment using a Flask web application.
The project uses the Amazon Alexa Reviews Dataset (amazon_alexa.tsv), which contains user reviews for Amazon Alexa products.
Sentiment labels are generated using VADER Sentiment Analysis, which produces a compound score that is mapped to three classes:
- Positive
- Negative
- Neutral
- Text sentiment analysis
- Audio sentiment analysis
- Speech-to-text transcription
- Text preprocessing using NLP techniques
- MFCC feature extraction for audio
- TF-IDF vectorization for text
- Handling class imbalance using SMOTE
- Machine learning based sentiment classification
- Flask based web interface
1. Data Collection
- Amazon Alexa Reviews Dataset
2. Preprocessing
- Text cleaning
- Tokenization
- Stopwords removal
3. Feature Extraction
- TF-IDF Vectorization
4. Sentiment Classification
- Naïve Bayes
- Logistic Regression
- VADER Sentiment Analysis
1. Data Collection
- Custom audio clips generated using gTTS
2. Preprocessing
- Audio processing using Librosa
3. Feature Extraction
- MFCC (Mel Frequency Cepstral Coefficients) extraction using Librosa
4. Audio Transcription
- Convert audio to text using SpeechRecognition
5. Sentiment Classification
- Naïve Bayes
- Logistic Regression
- VADER Sentiment Analysis
The following machine learning models are used for sentiment classification:
- Logistic Regression
- Multinomial Naïve Bayes
These models are trained on features extracted using TF-IDF for text and MFCC for audio.
- Python
- Flask
- Scikit-learn
- NLTK
- VADER Sentiment Analysis
- Librosa
- SpeechRecognition
- Pandas
- NumPy
- Load Amazon Alexa review dataset
- Clean and preprocess review text
- Generate sentiment scores using VADER
- Convert text into numerical features using TF-IDF
- Extract MFCC features from audio signals
- Handle class imbalance using SMOTE
- Train Logistic Regression and Naïve Bayes models
- Convert audio to text using Speech Recognition
- Predict sentiment for both text and audio inputs
- Clone the repository
git clone https://github.com/your-username/Text-Audio-Sentiment-Analyzer.git
- Install dependencies
pip install -r requirements.txt
- Run the application
python app.py
- Open in browser