📘 Offensive Language & Hate Speech Detection Using Transformer, BiLSTM, and Hybrid Deep Learning Models
This repository contains the full implementation, experiments, and results for a comparative study of multiple NLP architectures—including BERT, RoBERTa, HateBERT, BiLSTM, and a Hybrid Transformer + BiLSTM model—applied to offensive language and hate-speech detection.
The project evaluates these models on two benchmark datasets (OLID and HateXplain) and also explores the performance of a hybrid architecture combining contextual embeddings from Transformers with sequential learning from BiLSTM.
This work supports a research study focused on analyzing model behaviour, performance limitations, and effectiveness of hybrid deep-learning techniques in offensive-language classification.
-
Performed three independent experiments:
- Experiment 1: Classification on OLID dataset
- Experiment 2: Classification on HateXplain dataset
- Experiment 3: Hybrid model combining RoBERTa embeddings + BiLSTM
-
Implemented and evaluated:
- BERT-base
- RoBERTa-base
- HateBERT
- BiLSTM with GloVe embeddings
- Hybrid Transformer + BiLSTM
-
Generated:
- Confusion matrices
- Train/validation loss curves
- Performance tables (Accuracy, Precision, Recall, F1)
-
Fully reproducible pipeline: preprocessing → training → evaluation
-
Modular experiment structure for clarity and replicability
/Project/
│
├── Dataset/
│ ├── OLID.csv
│ └── hatexplain.csv
│
├── Experiments/
│ ├── exp1/ → OLID dataset experiments
│ │ ├── olid_preprocessing/
│ │ └── model_training/
│ │ ├── transformer_model_1/
│ │ ├── transformer_model_2/
│ │ ├── transformer_model_3/
│ │ ├── bilstm_model/
│ │ └── scripts/
│ │
│ ├── exp2/ → HateXplain dataset experiments
│ │ ├── hatexplain_preprocessing/
│ │ └── model_training/
│ │
│ ├── exp3/ → Hybrid model experiments
│ │ ├── combined_dataset/
│ │ └── model_training/
│ │
└── requirements.txt
Each experiment folder contains:
- Preprocessing scripts
- Processed datasets
- Model-specific training scripts
- Checkpoints & logs
- Evaluation outputs (graphs, matrices, results)
Evaluated all models on the OLID Offensive Language Identification dataset.
Models tested:
- BERT-base
- RoBERTa-base
- HateBERT
- BiLSTM
- Hybrid (RoBERTa embeddings + BiLSTM)
Metrics computed:
- Accuracy
- Precision
- Recall
- F1-score
- Confusion Matrix
- Train/Val Loss Curves
Performed the same evaluation pipeline as Experiment 1 on the HateXplain dataset.
HateXplain is multi-annotator and more complex, allowing deeper analysis of:
- Model robustness
- Context understanding
- Semantic generalization
Designed a hybrid architecture:
Purpose:
- Test whether combining contextual embeddings with sequential modeling improves performance.
Findings:
- Hybrid model did not outperform standalone Transformers.
- Shows combining architectures does not guarantee improvements, especially when Transformers already capture long-range semantics.
| Model | Accuracy | F1-score |
|---|---|---|
| RoBERTa | Highest | Best contextual understanding |
| BERT | Very close | Stable performance |
| HateBERT | Similar | Domain-specific advantages |
| Model | Accuracy | F1-score |
|---|---|---|
| BERT | 0.8501 | Best overall |
| HateBERT | 0.8481 | Competitively close |
| RoBERTa | 0.8433 | Slight drop on OLID |
- Underperformed on both datasets
- Lower accuracy and F1 than Transformers
- Highlights the challenges of merging pretrained embeddings with sequence models
(Detailed tables, confusion matrices, and curves are included inside each experiment folder.)
git clone https://github.com/Vaibhav-Pant/Transformer-BiLSTM.git
cd Transformer-BiLSTMpip install -r requirements.txtPlace:
OLID.csvhatexplain.csv
inside the /Dataset/ directory.
(Datasets are not included due to license restrictions.)
python Experiments/exp1/olid_preprocessing/scripts/preprocess.pypython Experiments/exp1/model_training/transformer_model_1/train.pypython Experiments/exp3/model_training/hybrid/train.pyThis repository is part of the research work:
“Framework for offensive language detection using transfomer and Bi-LSTM”
The code directly supports:
- Dataset preprocessing
- Model training
- Performance evaluation
- Visualization generation