Evaluation of the Toxicity Detector Pipeline

This repository contains a test dataset for the evaluation of the KIdeKu Toxicity Detection Pipeline (https://github.com/debatelab/toxicity-detector). It includes the test dataset, notebooks for data generation and analysis and the generated detection data by the pipeline.

Important

This repository is part of ongoing research work. The used provided test dataset can due to its small size and construction only provide limited and exploratory insights into the performance of the toxicity detection pipeline. Please refer to the project page of KIdeKu (https://compphil2mmae.github.io/research/kideku/) for more information about the project and its outcomes.

Warning

The test dataset contains user-generated content from social media which may include offensive language. Please exercise caution when accessing or using the dataset.

Repository Structure

config/: YAML configuration files and metadata for the evaluation.
data/:
- kideku_tox_gold.csv: The test dataset for toxicity detection.
- detection_outputs/: Contains raw model outputs (YAML files), organized by date.
- evaluation/: Contains aggregated model outputs a basis for the evaluation.
  - eval_run_20260119.csv: Example evaluation output file generated with pipeline_config_01.yaml (and 5 text inputs from the test dataset).
  - eval_run_20260120.csv: Example evaluation output file generated with pipeline_config_02.yaml (and 5 text inputs from the test dataset).
  - eval_run_20260120_1.csv: Evaluation output file generated with pipeline_config_02.yaml and the whole test dataset.
notebooks/: Jupyter notebooks for data generation and analysis.
pyproject.toml: Project metadata and dependency definitions.

Kideku Toxicity Gold Standard Dataset

Overview

This dataset is a merged and re-annotated collection of German social media comments used for evaluating the KIdeKu toxicity detector. It combines subsets from two prominent datasets: HASOC 2019 (Goldstandard) and GermEval 2018 (Test).

The dataset contains a total of 285 entries, including original labels and new annotations from two independent annotators.

Genesis and Annotation Process

The dataset was created to provide a high-quality "gold standard" for toxicity classification, specifically distinguishing between personalized and group-based toxicity. Additionally, the dataset addresses ambiguities and uncertainties in annotations by allowing annotators to flag uncertain cases.

Source Datasets

HASOC 2019 (https://hasocfire.github.io/hasoc/2019/): Subset of the German Goldstandard.
GermEval 2018 (https://github.com/uds-lsv/GermEval-2018-Data): Subset of the Test dataset.

Annotation

A subset of these datasets (61 from HASOC, 53 from GermEval) was re-annotated by two annotators. The annotation followed a common guideline:

Personalized Toxicity (PERS): Insults, threats, or harassment directed at an individual without reference to group membership.
Group-based Toxicity (GRUP): Hate speech directed at a group or individuals as representatives of a group (based on religion, origin, sexual orientation, etc.).
BOTH: Contains both types of toxicity.
NONE: No toxicity detected.

The annotators worked independently and later aligned on specific edge cases (e.g., treatment of political groups as GRUP).

Dataset Structure

See also config/eval_metadata.yaml for details about the dataset structure.

Gold Standard CSV Format

The file kideku_tox_gold.csv contains the following columns:

Column	Description
`text_id`	Unique identifier. GermEval IDs are prefixed with `germeval2018_`, Hasoc IDs with `hasoc_de`.
`text`	The raw text of the comment.
`TOX_ANNO_1`	Toxicity label from Annotator 1.
`UNCERTAINTY_ANNO_1`	Flag (1) if Annotator 1 was uncertain.
`REMARKS_ANNO_1`	Optional remarks from Annotator 1.
`TOX_ANNO_2`	Toxicity label from Annotator 2.
`UNCERTAINTY_ANNO_2`	Flag (1) if Annotator 2 was uncertain.
`REMARKS_ANNO_2`	Optional remarks from Annotator 2.
`TOX_ANNO_3`	Mapped original label from the source dataset (HASOC/GermEval).

The original labels from the source datasets were mapped to our common format as follows:

HASOC 2019 (Subtask B):

OFFN → PERS
HATE → GRUP
PRFN → NONE
NONE → NONE

GermEval 2018 (Subtask 2):

INSULT → PERS
ABUSE → GRUP
PROFANITY → NONE
OTHER → NONE

Aggregated Gold Standard Label

We defined an aggregated label (column AGGREGATE_LABEL), which we use as standard for evaluating the performance of the toxicity detection pipeline.

Rough idea:

The original labels are not taken into account for the aggregated label since they stem from different annotation guidelines and are not fully compatible with our current annotation scheme. This is corraborated by low inter-annotator agreement between the original labels and the new annotations (~57% or 0.3 Krippendorff's alpha, see notebooks/goldstandard_analysis.ipynb for details).
We construct an aggregate label as follows: We categorize UNCLEAR if there is a disagreement between both annotators or if both use an uncertainty flag. Otherwise, we take the label both annotators used.

Detection Output CSV Format

Raw detection outputs from various model runs are stored in the data/detection_outputs/ directory, organized by date. These file contain everything that is needed to reproduce a run (including model parameters and prompt templates). Each run generates a CSV file in data/evaluation containing model predictions alongside the gold standard labels (files of the form eval_run_YYYYMMDD.csv).

These tables contain all relevant columns from kideku_tox_gold.csv along with model predictions in separate columns:

PERS_<model identifier>: Predicted personalized toxicity label by the model.
HATE_<model identifier>: Predicted hatespeech label by the model.
TOX_ANNO_<model identifier>: Aggregated predicted toxicity label by the model (see below).
HATE_DETECTION_UID_<model identifier> & PERS_DETECTION_UID_<model identifier>: Unique identifier for the model run (these UIDs refer to the relevant YAML output files in detection_outputs/, which contain specifics of each pipeline run, e.g., the full configuration, preliminary outcomes of the pipeline steps, etc.).
PIPELINE_CONFIG: The used pipeline configuration file for the run.

Aggregated Toxicity Label

The pipeline is designed to answer two separate detection tasks: detection of personalized toxicity and detection of group-based toxicity (hatespeech). Each task produces a separate output label (one of "true", "false" and "unclear"). The final toxicity label is then derived from these two outputs as follows:

Personalized Toxicity	Hatespeech	Final Label
`false`	`false`	`NONE`
`true`	`false`	`PERS`
`false`	`true`	`GRUP`
`true`	`true`	`BOTH`
`unclear`	any	`UNCLEAR`
any	`unclear`	`UNCLEAR`

Funding

The Toxicity Detector and its evaluation is part of the project "Opportunities of AI to Strengthen Our Deliberative Culture" (KIdeKu) which was funded by the Federal Ministry of Education, Family Affairs, Senior Citizens, Women and Youth (BMBFSFJ).

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
data		data
figures		figures
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluation of the Toxicity Detector Pipeline

Repository Structure

Kideku Toxicity Gold Standard Dataset

Overview

Genesis and Annotation Process

Source Datasets

Annotation

Dataset Structure

Gold Standard CSV Format

Aggregated Gold Standard Label

Detection Output CSV Format

Aggregated Toxicity Label

Funding

About

Uh oh!

Releases

Packages

Languages

License

debatelab/toxicity-detector-eval

Folders and files

Latest commit

History

Repository files navigation

Evaluation of the Toxicity Detector Pipeline

Repository Structure

Kideku Toxicity Gold Standard Dataset

Overview

Genesis and Annotation Process

Source Datasets

Annotation

Dataset Structure

Gold Standard CSV Format

Aggregated Gold Standard Label

Detection Output CSV Format

Aggregated Toxicity Label

Funding

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages