Dataset Statement
We introduce the STAtute Retrieval Dataset (STARD), a comprehensive dataset designed to bridge the gap in statute retrieval research by focusing on non-professional queries. STARD comprises 1,543 query cases from real-world legal consultations and a corpus of 55,348 candidate statutory articles.
The data collection for STARD involved rigorous processes to ensure high quality and relevance. Each query was sourced from real-world consultations and paired with relevant statutory articles. We paid special attention to ethical considerations; all personal information was removed to maintain privacy and anonymity, ensuring that the dataset adheres to ethical standards for research.
To enhance transparency and facilitate further research in legal informatics, STARD, along with its associated models and codes are released in this GitHub repository. We commit to regularly updating the dataset to reflect changes in legal statutes, maintaining its relevance and accuracy. STARD is freely accessible under the MIT license from our official website, promoting widespread use and encouraging advancements across various fields of legal research.