Evaluate performance of LLM models for Q&A in any domain
-
Updated
Jul 3, 2025 - Python
Evaluate performance of LLM models for Q&A in any domain
Automated search framework for rubric-based reward modeling. Features Evolutionary RTD (population search with elitism + successive halving) and Iterative RTD baseline. Supports tail-focused objectives, multi-role LLM backends, and rank-based preferences
AI-powered student assignment evaluator written in Rust. Supports code, PDF, and DOCX files. Uses local or remote LLMs to grade submissions based on configurable criteria, and exports results to Excel.
Add a description, image, and links to the automated-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the automated-evaluation topic, visit your repo's landing page and select "manage topics."