Skip to content

TomasCCPinto/RelEx-PT

Repository files navigation

RelEx-PT: A Sentence-Level Relation Extraction Dataset for Portuguese

RelEx-PT is a balanced, sentence-level dataset for Relation Extraction (RE) in Portuguese, designed as a controlled benchmark for developing and evaluating RE models. It comprises 18 relation types derived from Wikidata, spanning a variety of domains. The dataset is built through a distant supervision pipeline linking Wikidata triples with Portuguese Wikipedia sentences and enhanced with an NLI-based filtering process. Each instance contains a Portuguese sentence, along with an entailed <subject, relation, object> triple, and confidence scores from the NLI-based filtering stage.


📦 Repository Contents

  • 🗂️ Dataset files — Train, and test splits.
  • ⚙️ Pipeline code — Scripts implementing the dataset construction process.
  • 🚀 Execution script (startup.sh) — Runs the full pipeline end-to-end.
  • 💬 Prompt templates — Used in Relation Classification (RC), Relation Triple Extraction (RTE), and Open Information Extraction (OpenIE) experiments.

🧭 Usage Recommendations

Before running or modifying any component, please review the pipeline and its default configuration parameters. If you plan to extend, adapt, or re-run the process (e.g., with different domains, relation types, or data sizes), be sure to analyse and adjust the parameters according to your specific goals and computational resources.

📜 License

This project is licensed under the CC BY-SA 4.0 licence.

About

Portuguese Relation Extraction dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors