This project is the official evaluation code for the FATE benchmark. It is an open-source toolkit for generating and verifying Lean 4 solutions to math problems, with support for pass@k metrics and cost tracking.
- Unified generation interface across commercial APIs
- Lean 4 verification with static precheck and batched REPL verification
- pass@k computation and result aggregation
- Cost tracking for API calls
- Python 3.11+
- Lean 4 toolchain and
lakeinstalled if running local verification.
pip install -r requirements.txt- Prepare your model configurations in
config/models.yamland verification configuration inconfig/verify_config.yaml. - Prepare Lean Dependencies: This repository provides three versions of Lean workspaces under the
lean_workspacesdirectory. Runin the corresponding directory before running verification or the full pipeline.lake exe cache get
- Run generation only:
python -m src.generate --model openai_o3 \ --dataset data/FATE-H.json \ --n 100 --k 1 --mode lean
- Run the full pipeline (generate then verify):
python -m src.main --model openai_o3 \ --dataset data/FATE-H.json \ --n 100 --k 1 --mode lean
Outputs are saved under output/generate/<model>/..., and verification summaries are saved under output/verify/... or the paths configured in your YAML files.
Command-Line Arguments:
The src/main.py script for running the full generation and verification pipeline accepts the following arguments:
--model (required): The name of the model to evaluate.--dataset (required): The path to the dataset file.--n (optional, default: 10): The number of problems to process.--k (optional, default: 1): The number of attempts per problem.--api_key (optional): The API key for model calls. If omitted, it falls back to environment variables.--mode (optional, default: "lean"): Modes for different prompts.--timeout (optional): The timeout in seconds for a single verification task. Overrides the setting in the config file if provided.--max_workers (optional): The maximum number of concurrent workers for verification. Overrides the setting in the config file if provided.
src/: Generation, verification, model interfaces, and Lean utilitiesconfig/: YAML configuration files for models and verificationoutput/: Generated and verified resultslogs/: Runtime logslean_workspaces/: Contains different versions of Lean workspaces
MIT License. See the LICENSE file for details.