Hi,
I think this is a really cool and useful project.
But I noticed that this test file leaked the golden answer:
|
expected_answer = "1988-96" |
According to browsecomp's release note, all questions/answers should be encrypted.
Currently this pollution is impacting my model's training/eval, and perhaps yours, as well.
Can you use a dummy QA for test rather than using the real data from Browsecomp?
This will benefit both the community and yourself.
Thanks!
Hi,
I think this is a really cool and useful project.
But I noticed that this test file leaked the golden answer:
enterprise-deep-research/test_benchmark.py
Line 17 in ad0d535
According to browsecomp's release note, all questions/answers should be encrypted.
Currently this pollution is impacting my model's training/eval, and perhaps yours, as well.
Can you use a dummy QA for test rather than using the real data from Browsecomp?
This will benefit both the community and yourself.
Thanks!