RelEx-PT: A Sentence-Level Relation Extraction Dataset for Portuguese

RelEx-PT is a balanced, sentence-level dataset for Relation Extraction (RE) in Portuguese, designed as a controlled benchmark for developing and evaluating RE models. It comprises 18 relation types derived from Wikidata, spanning a variety of domains. The dataset is built through a distant supervision pipeline linking Wikidata triples with Portuguese Wikipedia sentences and enhanced with an NLI-based filtering process. Each instance contains a Portuguese sentence, along with an entailed <subject, relation, object> triple, and confidence scores from the NLI-based filtering stage.

📦 Repository Contents

🗂️ Dataset files — Train, and test splits.
⚙️ Pipeline code — Scripts implementing the dataset construction process.
🚀 Execution script (startup.sh) — Runs the full pipeline end-to-end.
💬 Prompt templates — Used in Relation Classification (RC), Relation Triple Extraction (RTE), and Open Information Extraction (OpenIE) experiments.

🧭 Usage Recommendations

Before running or modifying any component, please review the pipeline and its default configuration parameters. If you plan to extend, adapt, or re-run the process (e.g., with different domains, relation types, or data sizes), be sure to analyse and adjust the parameters according to your specific goals and computational resources.

📜 License

This project is licensed under the CC BY-SA 4.0 licence.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
RelEx-PT		RelEx-PT
prompts		prompts
LICENSE.txt		LICENSE.txt
NLI.py		NLI.py
NLI_filter.py		NLI_filter.py
README.md		README.md
dataset.py		dataset.py
filter_triples.py		filter_triples.py
requirements.txt		requirements.txt
startup.sh		startup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RelEx-PT: A Sentence-Level Relation Extraction Dataset for Portuguese

📦 Repository Contents

🧭 Usage Recommendations

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RelEx-PT: A Sentence-Level Relation Extraction Dataset for Portuguese

📦 Repository Contents

🧭 Usage Recommendations

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages