Skip to content

Latest commit

Β 

History

History
23 lines (14 loc) Β· 1.52 KB

File metadata and controls

23 lines (14 loc) Β· 1.52 KB

RelEx-PT: A Sentence-Level Relation Extraction Dataset for Portuguese

RelEx-PT is a balanced, sentence-level dataset for Relation Extraction (RE) in Portuguese, designed as a controlled benchmark for developing and evaluating RE models. It comprises 18 relation types derived from Wikidata, spanning a variety of domains. The dataset is built through a distant supervision pipeline linking Wikidata triples with Portuguese Wikipedia sentences and enhanced with an NLI-based filtering process. Each instance contains a Portuguese sentence, along with an entailed <subject, relation, object> triple, and confidence scores from the NLI-based filtering stage.


πŸ“¦ Repository Contents

  • πŸ—‚οΈ Dataset files β€” Train, and test splits.
  • βš™οΈ Pipeline code β€” Scripts implementing the dataset construction process.
  • πŸš€ Execution script (startup.sh) β€” Runs the full pipeline end-to-end.
  • πŸ’¬ Prompt templates β€” Used in Relation Classification (RC), Relation Triple Extraction (RTE), and Open Information Extraction (OpenIE) experiments.

🧭 Usage Recommendations

Before running or modifying any component, please review the pipeline and its default configuration parameters. If you plan to extend, adapt, or re-run the process (e.g., with different domains, relation types, or data sizes), be sure to analyse and adjust the parameters according to your specific goals and computational resources.

πŸ“œ License

This project is licensed under the CC BY-SA 4.0 licence.