RelEx-PT: A Sentence-Level Relation Extraction Dataset for Portuguese

RelEx-PT is a balanced, sentence-level dataset for Relation Extraction (RE) in Portuguese, designed as a controlled benchmark for developing and evaluating RE models. It comprises 18 relation types derived from Wikidata, spanning a variety of domains. The dataset is built through a distant supervision pipeline linking Wikidata triples with Portuguese Wikipedia sentences and enhanced with an NLI-based filtering process. Each instance contains a Portuguese sentence, along with an entailed <subject, relation, object> triple, and confidence scores from the NLI-based filtering stage.

📦 Repository Contents

🗂️ Dataset files — Train, and test splits.
⚙️ Pipeline code — Scripts implementing the dataset construction process.
🚀 Execution script (startup.sh) — Runs the full pipeline end-to-end.
💬 Prompt templates — Used in Relation Classification (RC), Relation Triple Extraction (RTE), and Open Information Extraction (OpenIE) experiments.

🧭 Usage Recommendations

Before running or modifying any component, please review the pipeline and its default configuration parameters. If you plan to extend, adapt, or re-run the process (e.g., with different domains, relation types, or data sizes), be sure to analyse and adjust the parameters according to your specific goals and computational resources.

📜 License

This project is licensed under the CC BY-SA 4.0 licence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RelEx-PT: A Sentence-Level Relation Extraction Dataset for Portuguese

📦 Repository Contents

🧭 Usage Recommendations

📜 License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

RelEx-PT: A Sentence-Level Relation Extraction Dataset for Portuguese

📦 Repository Contents

🧭 Usage Recommendations

📜 License