RelEx-PT is a balanced, sentence-level dataset for Relation Extraction (RE) in Portuguese, designed as a controlled benchmark for developing and evaluating RE models. It comprises 18 relation types derived from Wikidata, spanning a variety of domains. The dataset is built through a distant supervision pipeline linking Wikidata triples with Portuguese Wikipedia sentences and enhanced with an NLI-based filtering process. Each instance contains a Portuguese sentence, along with an entailed <subject, relation, object> triple, and confidence scores from the NLI-based filtering stage.
- 🗂️ Dataset files — Train, and test splits.
- ⚙️ Pipeline code — Scripts implementing the dataset construction process.
- 🚀 Execution script (
startup.sh) — Runs the full pipeline end-to-end. - 💬 Prompt templates — Used in Relation Classification (RC), Relation Triple Extraction (RTE), and Open Information Extraction (OpenIE) experiments.
Before running or modifying any component, please review the pipeline and its default configuration parameters. If you plan to extend, adapt, or re-run the process (e.g., with different domains, relation types, or data sizes), be sure to analyse and adjust the parameters according to your specific goals and computational resources.
This project is licensed under the CC BY-SA 4.0 licence.