Skip to content

reproducing benchmarks #17

@bogedy

Description

@bogedy

Hi, I'm trying to reproduce your benchmarks so that I can compare them to some modifications I'm curious about. In order to avoid costly API calls, I tried sideloading your informalizations found here into postgres https://huggingface.co/datasets/FrenzyMath/mathlib_informal_v4.16.0

I used some LLM agents to write code to get it working, and it's working now for search.

I'm continuing to try and reproduce your results, partiularly the last row of table 5. Would you be able to to provide more scripts and data that you used for your benchmarks? I found the theorems used in the benchmark here: https://aclanthology.org/2024.findings-emnlp.470/ But I lack things like your augmented queries and scripts for computing metrics. I can come up with these on my own but I figured I'd ask if you could share yours.

Thanks,
Isaiah

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions