-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hi, I'm trying to reproduce your benchmarks so that I can compare them to some modifications I'm curious about. In order to avoid costly API calls, I tried sideloading your informalizations found here into postgres https://huggingface.co/datasets/FrenzyMath/mathlib_informal_v4.16.0
I used some LLM agents to write code to get it working, and it's working now for search.
I'm continuing to try and reproduce your results, partiularly the last row of table 5. Would you be able to to provide more scripts and data that you used for your benchmarks? I found the theorems used in the benchmark here: https://aclanthology.org/2024.findings-emnlp.470/ But I lack things like your augmented queries and scripts for computing metrics. I can come up with these on my own but I figured I'd ask if you could share yours.
Thanks,
Isaiah