Should old evaluation metrics (CEAFe, B3, MUC) be considered inappropiate?

I've come accross the paper from ACL's website (https://www.aclweb.org/anthology/P16-1060/) which states that the traditional methods from conll2012 scripts are not so great methods to evaluate the coreference resolution task, and also introduce the LEA scorer (which has been implemented in this repository). However, the recent publications of this task are yet mainly evaluated by the old methods, and I can't see the reason why. Would be grateful for an appropriate answer.
Thanks :) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should old evaluation metrics (CEAFe, B3, MUC) be considered inappropiate? #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Should old evaluation metrics (CEAFe, B3, MUC) be considered inappropiate? #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions