A nuanced framework for detecting social bias in text that addresses the limitations of binary classification methods through multi-label token classification, focusing on three key linguistic components: Generalizations, Unfairness, and Stereotypes (GUS).
The detection of social bias in text particularly challenging due to the limitations of binary classification methods that oversimplify nuanced biases and fail to identify the root cause of bias. These methods often lead to high emotional impact when content is misclassified as either "biased" or "fair."
The GUS framework addresses these shortcomings by focusing on three key linguistic components underlying social bias:
- 🔍 Generalizations: Broad statements about groups
- ⚖️ Unfairness: Discriminatory or inequitable treatment
- 🏷️ Stereotypes: Oversimplified assumptions about groups
Traditional sequence-level classification suffers from:
- Root cause blindness: Cannot identify specific biased terms
- Over-abstraction: Makes controversial binary decisions
- Limited interpretability: Provides no insight into why text is biased
Our multi-label NER framework solves these problems by:
- Pinpointing bias: Identifies exact terms that contain bias
- Categorizing bias type: Classifies specific types of bias (Generalization, Unfairness, or Stereotype)
- Providing transparency: Shows users exactly what and why something is biased
Our experiment combines:
- Semi-automated dataset creation with human verification
- Discriminative (encoder-only) models and generative (decoder-only) LLMs
- Multi-label token classification task for nuanced bias detection
Extensive experiments demonstrate that encoder-only models are particularly effective for this complex task, often outperforming state-of-the-art methods in macro F1-score, entity-wise F1-score, and Hamming loss.
Detect and classify bias-related entities: Generalizations, Unfairness, and Stereotypes at the token level.
| Resource | Description |
|---|---|
| Annotation Pipeline | Annotate any dataset with GUS entities |
| BERT Training | Train BERT for multi-label token classification |
Note: Binary classification was explored during development but has significant limitations. It's included here for completeness and comparison purposes.
Limitations of sequence-level classification:
- Cannot identify which specific words or phrases are biased
- Makes abstract decisions that can be controversial
- Provides no interpretability for bias detection decisions
- Oversimplifies nuanced social biases into binary categories
| Resource | Description |
|---|---|
| Binary Classification Training | Legacy approach - Train BERT for sequence-level classification |
- Paper: The GUS Framework: Benchmarking Social Bias Classification with Discriminative (Encoder-Only) and Generative (Decoder-Only) Language Models
- HuggingFace Collection: Models & Datasets
- Interactive Demo: Try it out!
If you use this work in your research, please cite:
@misc{powers2025gusframeworkbenchmarkingsocial,
title={The GUS Framework: Benchmarking Social Bias Classification with Discriminative (Encoder-Only) and Generative (Decoder-Only) Language Models},
author={Maximus Powers and Shaina Raza and Alex Chang and Umang Mavani and Harshitha Reddy Jonala and Ansh Tiwari and Hua Wei},
year={2025},
eprint={2410.08388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.08388},
}