Is there an evaluation script that can directly compare a prediction file against the gold prediction file, i.e., the official evaluation script?