Skip to content

This repository contains benchmark datasets by Datasaur

Notifications You must be signed in to change notification settings

datasaur-ai/datasaur-databench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 

Repository files navigation

Benchmark Datasets for LLM Evaluation

This repository contains three datasets designed for evaluating and benchmarking large language models (LLMs) on legal question answering (QA) and related tasks.

๐Ÿ“ Datasets

1. Legal QA from Public Court Sources

A synthetic QA dataset created from real legal documents. Each question and answer is synthesized based on publicly available legal texts.

2. LegalBench Subset for LLM Benchmarking

A smaller version of the LegalBench dataset, reformatted for easier benchmarking.

  • Tasks: 129 legal reasoning tasks
  • Examples: 10 examples per task
  • Format: Reformatted for benchmark compatibility
  • License: Includes only tasks with reuse-friendly licenses

3. LegalBench-RAG QA Samples

A subset of the LegalBench-RAG dataset, focused on retrieval-augmented QA tasks.

  • QA Pairs: 200 examples
  • Domains: ContractNLI, CUAD, MAUD, PrivacyQA
  • Format: CSV files with prompt, expected answer, and reference context
  • Use Case: Useful for testing legal document QA with retrieval-based context

These datasets are designed for legal QA, model training, and benchmarking tasks. Please refer to each datasetโ€™s source license for usage terms.

About

This repository contains benchmark datasets by Datasaur

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors