As far as I understand, in data/training we only have the entity-tokens and each token is labeled with the class with the maximal value in each row.
However, how are the non-entity tokens handled? That is tokens that don't fit into neither of the classes ?