Add support for phoneme literals in the tokenizer#36
Conversation
Lyrcaxis
left a comment
There was a problem hiding this comment.
Nice feature! But let's make it only work when users explicitly enable it.
There was a problem hiding this comment.
Hey, sorry about the slow review time, I've been busy.
Looks good! Awesome feature :) Will have to update the README to display this functionality.
Some future changes that would be nice are:
-
Currently,
SpeechGuesser'sLowEffortmode will take the raw text + pronounciation in account for its calculations -- without any transformations. Would be good to have the higher-effort modes account for the pronounciation part or map them better (to more characters). -
Add some way to 'validate' that the user indeed wants the transformation to take place (although the current syntax is quite safe -- hard to happen by accident).
Will include in v0.6.2 along with the japanese + mandarin improvements👍
Fixes #35
Tells the tokenizer to not use espeak on parts look like this:
[Kokoro](/kˈOkəɹO/).It should translate everything before and after that using espeak, but insert this part as
kˈOkəɹOjust before tokenizing.I'm using the format from misaki because it's easy to detect using regex, but I'm open to suggestion for a better pattern.