Skip to content

GB2312's em dash #18

@Artoria2e5

Description

@Artoria2e5

bsdconv's GB2312 table which comes from unicode.org and went missing after EASTASIA charts became obsolete is, to some extent, similar to Unicode's Big5 table in quality. (I will use unicode.org's whatever hex to refer to GB codepoints, so add 0x8080 for EUC-CN.)

In GB2312-1980, 212A is defined as 破折号 (em dash), but the Unicode mapping gives a U+2015 (horizontal bar) instead of U+2014, apparently without reading the Chinese text at all. Hence GB2312's decoder should be changed to emit U+2014 just for proper punctuation; the encoder should be made to accept U+2014 too.

By the way, 212A is one of "Unicode" gb2312-80's incompatibilities with GBK; the other one is at 2124. You may choose to use a non-fullwidth, regular "middle dot" as GBK does and W3C CLREQ recommends typographically, but what I hope for now is just the encoder accepting U+00B7.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions