Skip to content

[low-priority] mmlu_pro: Allow using LLM-as-judge as parser#101

Open
tomtseng wants to merge 1 commit intomainfrom
tomtseng/mmlu-pro-llm-judge-impl
Open

[low-priority] mmlu_pro: Allow using LLM-as-judge as parser#101
tomtseng wants to merge 1 commit intomainfrom
tomtseng/mmlu-pro-llm-judge-impl

Conversation

@tomtseng
Copy link
Collaborator

@tomtseng tomtseng commented Feb 24, 2026

Changes

In PR #99 we found that LLM-as-a-judge for MMLU-Pro grading works better than the regex parser, though it doesn't change our top-line results much. This PR edits MMLU-Pro to allow configuring MMLU-Pro to use LLM-as-a-judge grading (as opposed to #99 which invokes the LLM-as-a-judge post-hoc), but still defaults to the regex parser.

Testing

new unit tests. I haven't tried running the code fully end-to-end

@tomtseng tomtseng force-pushed the tomtseng/mmlu-pro-llm-judge-impl branch from 8cb0e39 to 77eaa35 Compare February 24, 2026 15:14
@tomtseng tomtseng changed the title mmlu_pro: Allow using LLM-as-judge as parser [low-priority] mmlu_pro: Allow using LLM-as-judge as parser Feb 24, 2026
@tomtseng tomtseng marked this pull request as ready for review February 24, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant