Skip to content

Framework for named entity recognition of generalizations, unfairness, and stereotypes in text.

Notifications You must be signed in to change notification settings

maximus-powers/gus-net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The GUS Framework (GUS-Net)

arXiv HuggingFace Demo License

A nuanced framework for detecting social bias in text that addresses the limitations of binary classification methods through multi-label token classification, focusing on three key linguistic components: Generalizations, Unfairness, and Stereotypes (GUS).

Overview

The detection of social bias in text particularly challenging due to the limitations of binary classification methods that oversimplify nuanced biases and fail to identify the root cause of bias. These methods often lead to high emotional impact when content is misclassified as either "biased" or "fair."

The GUS framework addresses these shortcomings by focusing on three key linguistic components underlying social bias:

  • 🔍 Generalizations: Broad statements about groups
  • ⚖️ Unfairness: Discriminatory or inequitable treatment
  • 🏷️ Stereotypes: Oversimplified assumptions about groups

Why Multi-Label NER Over Binary Classification?

Traditional sequence-level classification suffers from:

  • Root cause blindness: Cannot identify specific biased terms
  • Over-abstraction: Makes controversial binary decisions
  • Limited interpretability: Provides no insight into why text is biased

Our multi-label NER framework solves these problems by:

  • Pinpointing bias: Identifies exact terms that contain bias
  • Categorizing bias type: Classifies specific types of bias (Generalization, Unfairness, or Stereotype)
  • Providing transparency: Shows users exactly what and why something is biased

Methodology

Our experiment combines:

  1. Semi-automated dataset creation with human verification
  2. Discriminative (encoder-only) models and generative (decoder-only) LLMs
  3. Multi-label token classification task for nuanced bias detection

Extensive experiments demonstrate that encoder-only models are particularly effective for this complex task, often outperforming state-of-the-art methods in macro F1-score, entity-wise F1-score, and Hamming loss.


Named Entity Recognition (Token-level)

Detect and classify bias-related entities: Generalizations, Unfairness, and Stereotypes at the token level.

Notebooks

Resource Description
Annotation Pipeline Annotate any dataset with GUS entities
BERT Training Train BERT for multi-label token classification

Blog Post

Binary Classification (Legacy Approach)

Note: Binary classification was explored during development but has significant limitations. It's included here for completeness and comparison purposes.

Limitations of sequence-level classification:

  • Cannot identify which specific words or phrases are biased
  • Makes abstract decisions that can be controversial
  • Provides no interpretability for bias detection decisions
  • Oversimplifies nuanced social biases into binary categories

Notebook

Resource Description
Binary Classification Training Legacy approach - Train BERT for sequence-level classification

Blog Post

Resources & Links

Citation

If you use this work in your research, please cite:

@misc{powers2025gusframeworkbenchmarkingsocial,
      title={The GUS Framework: Benchmarking Social Bias Classification with Discriminative (Encoder-Only) and Generative (Decoder-Only) Language Models}, 
      author={Maximus Powers and Shaina Raza and Alex Chang and Umang Mavani and Harshitha Reddy Jonala and Ansh Tiwari and Hua Wei},
      year={2025},
      eprint={2410.08388},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.08388}, 
}

About

Framework for named entity recognition of generalizations, unfairness, and stereotypes in text.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published