See https://imobench.github.io for more info.
- IMO-AnswerBench: answerbench_v2.csv
- IMO-ProofBench: proofbench.csv
- IMO-GradingBench: gradingbench.csv
As of 02/12/2026, we updated IMO-Answerbench with answerbench_v2.csv to fix
some problems that had ambiguous problem statements or incorrect answers.
The previous version answerbench.csv is now deprecated.
We would like to thank Xujie from Tsinghua University, Aiden Jung and Hyunwoo Choi from MIT, Youngbeom Jin from Caltech, Jiwon Kang from Seoul National University, and Vineet Gupta and Pranjal Awasthi from Google DeepMind for helping identify the issues with IMO-AnswerBench!
@inproceedings{luong-etal-2025-towards,
title = "Towards Robust Mathematical Reasoning",
author = {Thang Luong and Dawsen Hwang and Hoang H. Nguyen and Golnaz Ghiasi and Yuri Chervonyi and Insuk Seo and Junsu Kim and Garrett Bingham and Jonathan Lee and Swaroop Mishra and Alex Zhai and Clara Huiyi Hu and Henryk Michalewski and Jimin Kim and Jeonghyun Ahn and Junhwi Bae and Xingyou Song and Trieu H. Trinh and Quoc V. Le and Junehyuk Jung},
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
year = "2025",
url = "https://aclanthology.org/2025.emnlp-main.1794/",
}
Copyright 2025 Google LLC
All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0
All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode
Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.
This is not an official Google product.