Skip to content

debatelab/toxicity-detector-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluation of the Toxicity Detector Pipeline

This repository contains a test dataset for the evaluation of the KIdeKu Toxicity Detection Pipeline (https://github.com/debatelab/toxicity-detector). It includes the test dataset, notebooks for data generation and analysis and the generated detection data by the pipeline.

Important

This repository is part of ongoing research work. The used provided test dataset can due to its small size and construction only provide limited and exploratory insights into the performance of the toxicity detection pipeline. Please refer to the project page of KIdeKu (https://compphil2mmae.github.io/research/kideku/) for more information about the project and its outcomes.

Warning

The test dataset contains user-generated content from social media which may include offensive language. Please exercise caution when accessing or using the dataset.

Repository Structure

  • config/: YAML configuration files and metadata for the evaluation.
  • data/:
    • kideku_tox_gold.csv: The test dataset for toxicity detection.
    • detection_outputs/: Contains raw model outputs (YAML files), organized by date.
    • evaluation/: Contains aggregated model outputs a basis for the evaluation.
      • eval_run_20260119.csv: Example evaluation output file generated with pipeline_config_01.yaml (and 5 text inputs from the test dataset).
      • eval_run_20260120.csv: Example evaluation output file generated with pipeline_config_02.yaml (and 5 text inputs from the test dataset).
      • eval_run_20260120_1.csv: Evaluation output file generated with pipeline_config_02.yaml and the whole test dataset.
  • notebooks/: Jupyter notebooks for data generation and analysis.
  • pyproject.toml: Project metadata and dependency definitions.

Kideku Toxicity Gold Standard Dataset

Overview

This dataset is a merged and re-annotated collection of German social media comments used for evaluating the KIdeKu toxicity detector. It combines subsets from two prominent datasets: HASOC 2019 (Goldstandard) and GermEval 2018 (Test).

The dataset contains a total of 285 entries, including original labels and new annotations from two independent annotators.

Genesis and Annotation Process

The dataset was created to provide a high-quality "gold standard" for toxicity classification, specifically distinguishing between personalized and group-based toxicity. Additionally, the dataset addresses ambiguities and uncertainties in annotations by allowing annotators to flag uncertain cases.

Source Datasets

Annotation

A subset of these datasets (61 from HASOC, 53 from GermEval) was re-annotated by two annotators. The annotation followed a common guideline:

  • Personalized Toxicity (PERS): Insults, threats, or harassment directed at an individual without reference to group membership.
  • Group-based Toxicity (GRUP): Hate speech directed at a group or individuals as representatives of a group (based on religion, origin, sexual orientation, etc.).
  • BOTH: Contains both types of toxicity.
  • NONE: No toxicity detected.

The annotators worked independently and later aligned on specific edge cases (e.g., treatment of political groups as GRUP).

Dataset Structure

See also config/eval_metadata.yaml for details about the dataset structure.

Gold Standard CSV Format

The file kideku_tox_gold.csv contains the following columns:

Column Description
text_id Unique identifier. GermEval IDs are prefixed with germeval2018_, Hasoc IDs with hasoc_de.
text The raw text of the comment.
TOX_ANNO_1 Toxicity label from Annotator 1.
UNCERTAINTY_ANNO_1 Flag (1) if Annotator 1 was uncertain.
REMARKS_ANNO_1 Optional remarks from Annotator 1.
TOX_ANNO_2 Toxicity label from Annotator 2.
UNCERTAINTY_ANNO_2 Flag (1) if Annotator 2 was uncertain.
REMARKS_ANNO_2 Optional remarks from Annotator 2.
TOX_ANNO_3 Mapped original label from the source dataset (HASOC/GermEval).

The original labels from the source datasets were mapped to our common format as follows:

HASOC 2019 (Subtask B):

  • OFFNPERS
  • HATEGRUP
  • PRFNNONE
  • NONENONE

GermEval 2018 (Subtask 2):

  • INSULTPERS
  • ABUSEGRUP
  • PROFANITYNONE
  • OTHERNONE
Aggregated Gold Standard Label

We defined an aggregated label (column AGGREGATE_LABEL), which we use as standard for evaluating the performance of the toxicity detection pipeline.

Rough idea:

  • The original labels are not taken into account for the aggregated label since they stem from different annotation guidelines and are not fully compatible with our current annotation scheme. This is corraborated by low inter-annotator agreement between the original labels and the new annotations (~57% or 0.3 Krippendorff's alpha, see notebooks/goldstandard_analysis.ipynb for details).
  • We construct an aggregate label as follows: We categorize UNCLEAR if there is a disagreement between both annotators or if both use an uncertainty flag. Otherwise, we take the label both annotators used.

Detection Output CSV Format

Raw detection outputs from various model runs are stored in the data/detection_outputs/ directory, organized by date. These file contain everything that is needed to reproduce a run (including model parameters and prompt templates). Each run generates a CSV file in data/evaluation containing model predictions alongside the gold standard labels (files of the form eval_run_YYYYMMDD.csv).

These tables contain all relevant columns from kideku_tox_gold.csv along with model predictions in separate columns:

  • PERS_<model identifier>: Predicted personalized toxicity label by the model.
  • HATE_<model identifier>: Predicted hatespeech label by the model.
  • TOX_ANNO_<model identifier>: Aggregated predicted toxicity label by the model (see below).
  • HATE_DETECTION_UID_<model identifier> & PERS_DETECTION_UID_<model identifier>: Unique identifier for the model run (these UIDs refer to the relevant YAML output files in detection_outputs/, which contain specifics of each pipeline run, e.g., the full configuration, preliminary outcomes of the pipeline steps, etc.).
  • PIPELINE_CONFIG: The used pipeline configuration file for the run.
Aggregated Toxicity Label

The pipeline is designed to answer two separate detection tasks: detection of personalized toxicity and detection of group-based toxicity (hatespeech). Each task produces a separate output label (one of "true", "false" and "unclear"). The final toxicity label is then derived from these two outputs as follows:

Personalized Toxicity Hatespeech Final Label
false false NONE
true false PERS
false true GRUP
true true BOTH
unclear any UNCLEAR
any unclear UNCLEAR

Funding

The Toxicity Detector and its evaluation is part of the project "Opportunities of AI to Strengthen Our Deliberative Culture" (KIdeKu) which was funded by the Federal Ministry of Education, Family Affairs, Senior Citizens, Women and Youth (BMBFSFJ).

BMFSFJ Funding

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published