-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hello, I have a question regarding the evaluation procedure described in III. Obtain some top scores.
I tried using top_scores.py to compute the localization results, but I noticed some issues in the script. For example, the repository does not contain a logs_path folder. After replacing logs_path with model_logs and adjusting the script accordingly, I was able to run the evaluation. However, the results I obtained do not match those reported in the paper.
Specifically, for the 16B model, the paper reports:
LLMAO with CodeGen-16B
top-1: 88 (22.3%)
top-3: 149 (37.7%)
top-5: 183 (46.3%)
But my output was:
Top 1,3,5 of 395 total bugs for defects4j-16B:
[57 (14.4%) & 76 (19.2%) & 90 (22.8%)]
I’m wondering if I might be using the wrong folder, or if the contents of model_logs need to be updated. Could you please take a look and clarify this?
Thank you!