📘 Offensive Language & Hate Speech Detection Using Transformer, BiLSTM, and Hybrid Deep Learning Models

This repository contains the full implementation, experiments, and results for a comparative study of multiple NLP architectures—including BERT, RoBERTa, HateBERT, BiLSTM, and a Hybrid Transformer + BiLSTM model—applied to offensive language and hate-speech detection.

The project evaluates these models on two benchmark datasets (OLID and HateXplain) and also explores the performance of a hybrid architecture combining contextual embeddings from Transformers with sequential learning from BiLSTM.

This work supports a research study focused on analyzing model behaviour, performance limitations, and effectiveness of hybrid deep-learning techniques in offensive-language classification.

🚀 Project Highlights

Performed three independent experiments:
- Experiment 1: Classification on OLID dataset
- Experiment 2: Classification on HateXplain dataset
- Experiment 3: Hybrid model combining RoBERTa embeddings + BiLSTM
Implemented and evaluated:
- BERT-base
- RoBERTa-base
- HateBERT
- BiLSTM with GloVe embeddings
- Hybrid Transformer + BiLSTM
Generated:
- Confusion matrices
- Train/validation loss curves
- Performance tables (Accuracy, Precision, Recall, F1)
Fully reproducible pipeline: preprocessing → training → evaluation
Modular experiment structure for clarity and replicability

📂 Project Structure

/Project/
│
├── Dataset/
│   ├── OLID.csv
│   └── hatexplain.csv
│
├── Experiments/
│   ├── exp1/   → OLID dataset experiments
│   │   ├── olid_preprocessing/
│   │   └── model_training/
│   │       ├── transformer_model_1/
│   │       ├── transformer_model_2/
│   │       ├── transformer_model_3/
│   │       ├── bilstm_model/
│   │       └── scripts/
│   │
│   ├── exp2/   → HateXplain dataset experiments
│   │   ├── hatexplain_preprocessing/
│   │   └── model_training/
│   │
│   ├── exp3/   → Hybrid model experiments
│   │   ├── combined_dataset/
│   │   └── model_training/
│   │
└── requirements.txt

Each experiment folder contains:

Preprocessing scripts
Processed datasets
Model-specific training scripts
Checkpoints & logs
Evaluation outputs (graphs, matrices, results)

📊 Experiments Overview

🧪 Experiment 1 — OLID Dataset

Evaluated all models on the OLID Offensive Language Identification dataset.

Models tested:

BERT-base
RoBERTa-base
HateBERT
BiLSTM
Hybrid (RoBERTa embeddings + BiLSTM)

Metrics computed:

Accuracy
Precision
Recall
F1-score
Confusion Matrix
Train/Val Loss Curves

🧪 Experiment 2 — HateXplain Dataset

Performed the same evaluation pipeline as Experiment 1 on the HateXplain dataset.

HateXplain is multi-annotator and more complex, allowing deeper analysis of:

Model robustness
Context understanding
Semantic generalization

🧪 Experiment 3 — Hybrid Model

Designed a hybrid architecture:

RoBERTa → (Frozen) Encoder → BiLSTM → Dense Classifier

Purpose:

Test whether combining contextual embeddings with sequential modeling improves performance.

Findings:

Hybrid model did not outperform standalone Transformers.
Shows combining architectures does not guarantee improvements, especially when Transformers already capture long-range semantics.

📈 Results Summary

HateXplain Dataset (Top Performers)

Model	Accuracy	F1-score
RoBERTa	Highest	Best contextual understanding
BERT	Very close	Stable performance
HateBERT	Similar	Domain-specific advantages

OLID Dataset (Top Performer)

Model	Accuracy	F1-score
BERT	0.8501	Best overall
HateBERT	0.8481	Competitively close
RoBERTa	0.8433	Slight drop on OLID

Hybrid Model

Underperformed on both datasets
Lower accuracy and F1 than Transformers
Highlights the challenges of merging pretrained embeddings with sequence models

(Detailed tables, confusion matrices, and curves are included inside each experiment folder.)

⚙️ Installation & Setup

Step 1 — Clone Repository

git clone https://github.com/Vaibhav-Pant/Transformer-BiLSTM.git
cd Transformer-BiLSTM

Step 2 — Install Dependencies

pip install -r requirements.txt

Step 3 — Add Datasets

Place:

OLID.csv
hatexplain.csv

inside the /Dataset/ directory.

(Datasets are not included due to license restrictions.)

▶️ Running the Experiments

Run preprocessing

python Experiments/exp1/olid_preprocessing/scripts/preprocess.py

Train any model

python Experiments/exp1/model_training/transformer_model_1/train.py

Hybrid model

python Experiments/exp3/model_training/hybrid/train.py

📝 Research Paper Support

This repository is part of the research work:

“Framework for offensive language detection using transfomer and Bi-LSTM”

The code directly supports:

Dataset preprocessing
Model training
Performance evaluation
Visualization generation

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
exp1		exp1
exp2		exp2
exp4		exp4
.gitignore		.gitignore
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📘 Offensive Language & Hate Speech Detection Using Transformer, BiLSTM, and Hybrid Deep Learning Models

🚀 Project Highlights

📂 Project Structure

📊 Experiments Overview

🧪 Experiment 1 — OLID Dataset

🧪 Experiment 2 — HateXplain Dataset

🧪 Experiment 3 — Hybrid Model

RoBERTa → (Frozen) Encoder → BiLSTM → Dense Classifier

📈 Results Summary

HateXplain Dataset (Top Performers)

OLID Dataset (Top Performer)

Hybrid Model

⚙️ Installation & Setup

Step 1 — Clone Repository

Step 2 — Install Dependencies

Step 3 — Add Datasets

▶️ Running the Experiments

Run preprocessing

Train any model

Hybrid model

📝 Research Paper Support

About

Uh oh!

Releases

Packages

Languages

Vaibhav-Pant/Transformer-BiLSTM

Folders and files

Latest commit

History

Repository files navigation

📘 Offensive Language & Hate Speech Detection Using Transformer, BiLSTM, and Hybrid Deep Learning Models

🚀 Project Highlights

📂 Project Structure

📊 Experiments Overview

🧪 Experiment 1 — OLID Dataset

🧪 Experiment 2 — HateXplain Dataset

🧪 Experiment 3 — Hybrid Model

RoBERTa → (Frozen) Encoder → BiLSTM → Dense Classifier

📈 Results Summary

HateXplain Dataset (Top Performers)

OLID Dataset (Top Performer)

Hybrid Model

⚙️ Installation & Setup

Step 1 — Clone Repository

Step 2 — Install Dependencies

Step 3 — Add Datasets

▶️ Running the Experiments

Run preprocessing

Train any model

Hybrid model

📝 Research Paper Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages