You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prefer the 'fork' method for creating the worker processes for parallel tagging, if it is supported by the operating system. This is much faster than the 'spawn' method that is the default on some non-Linux systems (issue #14).
Add option --use-nfkc to the command line interface and option use_nfkc to the constructor of ASPTagger (issue #11). If this option is used, the internal representation of the input data uses Unicode normalization form NFKC. This can be useful for social media input that misuses mathematical symbols for their typographic effects (e.g. “𝕴𝖒𝖕𝖋𝖆𝖚𝖘𝖜𝖊𝖎𝖘” instead of “Impfausweis”).
Add option --sentence-tag to specify an XML tag in the input data that marks sentence boundaries (issue #12). This is particularly useful in combination with the --sentence-tag option of SoMaJo.