Skip to content

Document why Montenegrin (sr-ME) is mapped to "srp" and not to "cnr" #6

@sebastian-nagel

Description

@sebastian-nagel

@pjox, @laurieburchell

CLD2's detected language Montenegrin (sr-ME) is mapped to Serbian ("srp") and to to "cnr" because several code comments and the 2014 CLD2 release notes state that there was little training data for it and that it is recognized as Serbian by CLD2.

However, a text recognized as Serbian with a URL in the .me top-level domain, may be classified as sr-ME. The annotations in the WARC metadata record than include sr-ME.

This was discussed here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions