Evaluation repository for Mercor's APEX benchmarks.
This repository contains evaluation harness for evaluating large language models (LLMs) against the APEX benchmark suite.
The main evaluation package for APEX-v1-extended is located in apex-evals-v1-extended/. See that documentation for:
- Installation instructions
- Usage examples
- API reference
- Supported models
CC-by-4.0