Concept Level Sentiment Analysis of Bengali Text

Overview

This project explores the concept-level sentiment analysis of Bengali text, focusing on restaurant reviews, comments, and blogs. Unlike traditional feature-based sentiment analysis, this research introduces a concept-level approach using custom-built dictionaries to identify and classify sentiments more accurately.

Key Features

Data Collection:
- Collected 2,700 Bengali text reviews, comments, and blogs from Facebook.
- Labeled data manually into positive, negative, and neutral sentiment categories.
Preprocessing Workflow:
- Punctuation Removal: Removed unnecessary symbols and spaces.
- Tokenization & Detokenization: Processed text using nltk for splitting and rejoining meaningful words.
- Stopword Removal: Eliminated uninformative words like "অথচ," "অথবা," and "এটি."
- Concept Construction: Built custom dictionaries for nouns, adjectives, verbs, and adverbs to extract sentiment-bearing concepts.
Machine Learning Models:
- Evaluated algorithms: Naïve Bayes, Support Vector Machine (SVM), Random Forest, Decision Tree, and K-Nearest Neighbor.
- Compared model performance on feature-based and concept-level datasets.

Methodology

Concept Extraction:
- Created self-made dictionaries to identify sentiment-rich concepts.
- Mapped these concepts to BanglaSenticNet for sentiment matching.
Data Processing:
- Cleaned data and transformed text into concepts using custom algorithms.
- Generated two datasets: one for feature-based analysis and another for concept-level analysis.
Evaluation Metrics:
- Used metrics such as accuracy, precision, recall, and F1-score for model performance evaluation.
- Confusion matrix to analyze predictions.

Results

Feature-Based Approach:
- SVM achieved the highest accuracy: 84%.
- Other models' accuracy: Random Forest (73%), Decision Tree (72%), Naïve Bayes (66%).
Concept-Level Approach:
- SVM achieved the highest accuracy: 96%.
- Other models' accuracy: Random Forest (75%), Decision Tree (71%), Naïve Bayes (68%), K-Nearest Neighbor (68%).
Performance Comparison:
- The concept-level approach outperformed feature-based models across all metrics, showcasing its efficacy in sentiment analysis for Bengali text.

Conclusion

The proposed concept-level approach improves sentiment detection by incorporating domain-specific knowledge from dictionaries.
SVM consistently outperformed other models, validating its robustness in both feature-based and concept-level approaches.
The research highlights the challenges of extracting meaningful concepts from Bengali text and the effectiveness of custom-built dictionaries.

Future Directions

Advanced Concept Extraction:
- Implement dependency-based semantic parsing for improved results.
Expanding Sentiment Labels:
- Move beyond positive, negative, and neutral to include emotions like joy, anger, and sadness.
Larger Datasets:
- Work with a more extensive and diverse dataset to generalize findings.
Multi-language Support:
- Apply the methodology to other languages using R or Python.

References

Key references include:

BanglaSenticNet for concept matching.
SentiWordNet for sentiment classification.
Various research papers on multilingual sentiment analysis and dependency parsing.

For more details, please refer to the research paper attached in this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concept Level Sentiment Analysis of Bengali Text

Overview

Key Features

Methodology

Results

Conclusion

Future Directions

References

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Concept Level Sentiment Analysis of Bengali Text

Overview

Key Features

Methodology

Results

Conclusion

Future Directions

References