This repository contains three datasets designed for evaluating and benchmarking large language models (LLMs) on legal question answering (QA) and related tasks.
A synthetic QA dataset created from real legal documents. Each question and answer is synthesized based on publicly available legal texts.
- QA Pairs: Each example includes a question, an answer, and a link to the source document.
- Source Documents: PDF files from:
A smaller version of the LegalBench dataset, reformatted for easier benchmarking.
- Tasks: 129 legal reasoning tasks
- Examples: 10 examples per task
- Format: Reformatted for benchmark compatibility
- License: Includes only tasks with reuse-friendly licenses
A subset of the LegalBench-RAG dataset, focused on retrieval-augmented QA tasks.
- QA Pairs: 200 examples
- Domains: ContractNLI, CUAD, MAUD, PrivacyQA
- Format: CSV files with prompt, expected answer, and reference context
- Use Case: Useful for testing legal document QA with retrieval-based context
These datasets are designed for legal QA, model training, and benchmarking tasks. Please refer to each datasetโs source license for usage terms.