-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Given that:
- most keyboard layouts have no support for fancy letters or punctuation marks such as
æ,’,“”,…, etc. - many corpus texts don’t use these fancy characters either
- the kalamine analyzer can default to ASCII when these characters are not supported by a keyboard layout:
aeinstead ofæ,'instead of’,...instead of…,""instead of“”, etc.
our corpus should be “fancified” before getting transformed into JSON dictionary, in order not to penalize keyboard layouts that have a proper support for these special characters. That’s what the fancify.sh script (or make fancy target) does. But this is still a work in progress — several substitutions are still missing, e.g.:
- straight quote pairs into
“”,« »,„“depending on the language - fine no-break space before
?:;!in French ¿sign in Spanish- dashes rather than
-- - etc.
Metadata
Metadata
Assignees
Labels
No labels