Skip to content

Conversation

@KristjanESPERANTO
Copy link
Collaborator

@KristjanESPERANTO KristjanESPERANTO commented Aug 6, 2025

This is replacing the non-functioning concept script (gen_word_error_correction.js) with a working one (gen_word_error_correction.mjs) that is integrated into the build process 🥳

It retrieves all month and week names from the Intl API and packs them together with the manual definitions (in word_error_correction_manual.yaml) into word_error_correction.yaml.

This means that we support over 240 languages with that, compared to only a handful previously 🤯 And we don't even have to worry about maintaining the strings, as they are always queried dynamically during the build 😁

Example

Before, the string 月曜日-金曜日 09:00-17:00 was not usable.

Before

Prettified: not possible

Warnings:

月 <--- (Unexpected token: "月" This means that the syntax is not valid at that point or it is currently not supported.) 
(Use `node --trace-uncaught ...` to show where the exception was thrown)

After

Prettified: Mo-Fr 09:00-17:00

Warnings:

月曜日 <--- (Please use the English abbreviation "Mo" for "月曜日".)
月曜日-金曜日 <--- (Please use the English abbreviation "Fr" for "金曜日".)

Short names

With my last commit, I also added short names. However, I had to filter out ambiguous names because there were too many of them. Without filtering them out, we would receive a large number of such warnings:

  "sat": "Word \"sat\" is ambiguous: Sa (English) or Sep (Hausa) or Sa (Igbo). Please specify language context or use English weekday name."
  "sun": "Word \"sun\" is ambiguous: Su (English) or Su (Faroese) or Jun (Tongan). Please specify language context or use English weekday name."

I'm really glad to be creating this PR now. It took me a lot of time 😴. Since the tests look good, I'd actually like to merge it right away. But since it's quite a significant change, I'll wait a little for your feedback @ypid 🙂

Copy link

@HolgerJeromin HolgerJeromin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
I was not aware the Intl API is available in node.

@KristjanESPERANTO
Copy link
Collaborator Author

Nice! I was not aware the Intl API is available in node.

Yes, that's cool. So we don't need any external dependencies for that. It just feels a bit wrong to use brute force to find out which languages are supported by Intl.

@HolgerJeromin
Copy link

It just feels a bit wrong to use brute force to find out which languages are supported by Intl.

Yeah, but js engine local only and at build time. So 🫣

@KristjanESPERANTO KristjanESPERANTO force-pushed the feat/gen_word_error_correction branch from fe9b4c9 to 95b6166 Compare August 18, 2025 12:24
@KristjanESPERANTO KristjanESPERANTO force-pushed the feat/gen_word_error_correction branch from 95b6166 to b1fd4ae Compare October 25, 2025 20:47
@KristjanESPERANTO KristjanESPERANTO marked this pull request as draft October 25, 2025 21:00
…endencies

- Replacement for removed non-functional concept script gen_word_error_correction.js with production-ready implementation
- Without external dependencies
- Use native Intl.DateTimeFormat for date/time formatting
- Use native Intl.DisplayNames for dynamic language name resolution
- Add dynamic locale discovery covering 140+ languages
- Implement ambiguous word detection with warning system
These entries are now automatically generated.
- Increase supported languages from 146 to 244
- Maintain low conflict rate with only on additional ambiguous word detected
@KristjanESPERANTO KristjanESPERANTO force-pushed the feat/gen_word_error_correction branch from b1fd4ae to d573aa3 Compare October 25, 2025 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants