add evaluation script. #12

erip · 2020-12-03T01:18:16Z

Closes #5

Currently computes weighted F1 score across the entire test set.

TODO: add confusion matrix, but there are some issues with PyTorch-Lightning reducing CMs...

erip · 2020-12-03T01:18:32Z

kylebgorman · 2020-12-03T01:23:11Z

Looks good to me. Tag accuracy would also be good; F1 is not helpful for the non-chunk-ing-type tasks. Some folks report whole-sentence accuracy too.

add evaluation script.

7bbeba9

reformat with black.

37230a0

Provide feedback