Skip to content

Better fancify.sh #3

@fabi1cazenave

Description

@fabi1cazenave

Given that:

  • most keyboard layouts have no support for fancy letters or punctuation marks such as æ, , “”, , etc.
  • many corpus texts don’t use these fancy characters either
  • the kalamine analyzer can default to ASCII when these characters are not supported by a keyboard layout: ae instead of æ, ' instead of , ... instead of , "" instead of “”, etc.

our corpus should be “fancified” before getting transformed into JSON dictionary, in order not to penalize keyboard layouts that have a proper support for these special characters. That’s what the fancify.sh script (or make fancy target) does. But this is still a work in progress — several substitutions are still missing, e.g.:

  • straight quote pairs into “”, « », „“ depending on the language
  • fine no-break space before ?:;! in French
  • ¿ sign in Spanish
  • dashes rather than --
  • etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions