-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
I am comparing the results that is already in the repo and the result at /home/turx/EvalBase/results. Some method have different results in these two locations.
For example, this is a segment from realsumm_abs_summary.txt from the repo, under pearsonr:
But this is the corresponding segment from /home/turx/EvalBase/results, also under pearsonr:
trad bertscore-sentence P 0.087
R 0.308
F 0.233
new bertscore-sentence P -0.067
R 0.326
F 0.222
I am really worried about the code accuracy. Please find out what has changed that caused this discrepancy. We have had incidents like this before. I do not want to have it again. I do not want to publish a paper based on wrong results.