Skip to content
This repository was archived by the owner on Jan 20, 2026. It is now read-only.
/ masters-thesis Public archive

code, documentation, and evaluation framework for my Master’s Thesis

Notifications You must be signed in to change notification settings

layaxx/masters-thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

312 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic Induction of Regular Constraints for a Task-General Relation Extraction System

This repository contains the code, documentation, and evaluation framework for my Master’s Thesis in Applied Computer Science at the University of Bamberg.

  • Author: Yannick Lang
  • Supervisor: Dr. Sean Papay (Bamberg NLP-Group)
  • Submission Date: December 6, 2025
  • Defense Date: January 15, 2026

Abstract & Project Goal

Traditional linear-chain Conditional Random Fields (CRFs) excel at local sequence labeling but struggle to enforce global structural constraints. This thesis presents a system that automatically discovers global patterns, such as role ordering and co-occurrence, and encodes them into a Regular-Language-Constrained CRF (RegCCRF).

Supported Datasets

The system was evaluated on four diverse relation extraction and semantic role labeling tasks and one NER task:

  • GePaDeSpkAtt: German parliamentary debate events.
  • Genia: Biomedical event extraction + Named Entity Recognition.
  • RiQuA: Speech events in English literature.
  • OntoNotes 5.0: Large-scale general-domain semantic role labeling.

Those datasets are not included in this repository.

Repository Structure

Folder Description
01-proposal/ LaTeX source for the initial thesis proposal.
02-implementation/ Core Logic: Constraint discovery, selection algorithms, and automaton generation.
03-paper/ LaTeX source for the final thesis, including raw evaluation results and plotting scripts.
04-sources/ Archived PDFs (omitted due to copyright).
05-presentation/ Defense slides (PDF) and speaker notes.

Getting Started

Prerequisites

  • Python 3.9
  • PyTorch (CUDA)
  • Specific dependencies listed in 02-implementation/environment.yml
  • for some model configurations, up to 16 GB of VRAM are required.

Data Note: Due to licensing, datasets are not included.

Key Features

  • Automated Discovery: Identifies recurring patterns in relation structures.
  • Constraint Selection: Algorithms to filter noise and retain high-impact regular constraints.
  • Automaton Integration: Converts induced rules into FSAs that interface model.

About

code, documentation, and evaluation framework for my Master’s Thesis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published