Ideas for optimizations

Just tossing some ideas out there.

The technical report outlines the `Quick_Check` algorithms for fast verification whether a given string is in a given normalization form: [Detecting Normalization Forms](http://www.unicode.org/reports/tr15/#Detecting_Normalization_Forms).

These are YES / NO / MAYBE answers, so in theory it should be possible to implement this as two regexps:
1. Whitelist regexp: If this regexp matches, the answer is YES, otherwise continue.
2. Blacklist regexp: If this regexp matches, the answer is NO, otherwise MAYBE.

The regexps should be automatically generated. It is complicated by the fact that JavaScript regexps only support UCS-2 and not UTF-16, so we have to manually calculate surrogate pairs (and nested matching groups). See [punycode.js](http://mths.be/punycode).

If implemented, we could add test functions, fx. `'foo'.isNormalization('NFC')`. Internally they could be used for speeding up any given normlization. Something like this: Use the whitelist regexp to match the longest prefix in this normalization, then cut that out and normalize the rest recursively. We only need to normalize the parts that are not in the normalization already, but we have to be careful about the boundaries between normalized/non-normalized. Some more strategies are outlined in the technical report: [Optimization Strategies](http://www.unicode.org/reports/tr15/#Optimization_Strategies).

First and foremost, we should make a benchmark test suite, to actually gain some information whether these optimizations gives a boost in speed for long strings. And it would be nice to know how much it means for the size of the library.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ideas for optimizations #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Ideas for optimizations #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions