Skip to content

elbbo/hatespeech-sentiment-experiments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis of Group and Person Address Annotations and negative Sentiments as potential Hate Speech Indicators

This repository contains the programs for the data processing and experiments of the master thesis "Annotation and analysis of group and person address and negative sentiments as potential hate speech indicators", which was conducted at the University of Duisburg-Essen. It includes the programs for the creation of the underlying dataset which was loaded into the tool INCEpTION for an annotation with the proposed annotation scheme. The data used for the annotations and experiments was derived from the Germeval Task 2, 2019 - Shared Task on the Identification of Offensive Language. In addition, two pipelines are provided. An agreement on the annotations of the respective annotators is determined to validate the reliability of the presented annotation scheme. Furthermore, the surface structure of the resulting dataset is evaluated and the experiments for an automated detection of the scheme are applied.

The annotated files in tsv format can be found at src\main\resources\tsv-files. The files used for the experiments in XMI format are located at src\main\resources\set-2. The corresponding TypeSystem.xml is stored there as well.

Prerequisites & Resources

The following items should be installed in your system:

For the determination of the sentiments within the conducted experiments the Broad-Coverage German Sentiment Classification Model for Dialog Systems was used. Further, this process step was implemented using the library dkpro-cassis. The pipelines for the determination of the agreement, the analysis of the surface structure and the majority of the experiments were implemented through the DKPro framework.

Configuration

A pipeline is provided that determines the agreement of the respective coders. The corresponding annotated files are located under src/main/resources/agreement-*.

To generate a new dataset, the following files should first be executed in \preprocessing\dataset_collection

  • emoji_string_to_unicode.py needs to be executed to replace emojis encoded as strings with their corresponding unicode representation (if necessary), and
  • dataset_processor.py should be executed to prepare the files for the pipeline.

However, it should be noted that these preprocessing was designed particularly for the source dataset.

Run the Experimets

To run the pipelines load the repository

  • git clone https://github.com/luckybobo/hatespeech-sentiment-experiments.git.
  • Import as project into your IDE.
  • Then execute App.java in src/main/java/en/uni/due/haring/annotation/analyser/.

The execution with the provided dataset leads to the results, which are discussed, analyzed and presented in the thesis. The results of the experiments are located at src/main/resources/results/*.

License

The present project is distributed under CC BY-NC-SA 4.0, just like the GermEval dataset on which this thesis is based.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors