Add support for phoneme literals in the tokenizer by PhilippNaused · Pull Request #36 · Lyrcaxis/KokoroSharp

PhilippNaused · 2025-06-10T14:31:30Z

Fixes #35

Tells the tokenizer to not use espeak on parts look like this: [Kokoro](/kˈOkəɹO/).
It should translate everything before and after that using espeak, but insert this part as kˈOkəɹO just before tokenizing.

I'm using the format from misaki because it's easy to detect using regex, but I'm open to suggestion for a better pattern.

Lyrcaxis

Nice feature! But let's make it only work when users explicitly enable it.

KokoroSharp/Processing/Tokenizer.cs

Lyrcaxis

Hey, sorry about the slow review time, I've been busy.

Looks good! Awesome feature :) Will have to update the README to display this functionality.

Some future changes that would be nice are:

Currently, SpeechGuesser's LowEffort mode will take the raw text + pronounciation in account for its calculations -- without any transformations. Would be good to have the higher-effort modes account for the pronounciation part or map them better (to more characters).
Add some way to 'validate' that the user indeed wants the transformation to take place (although the current syntax is quite safe -- hard to happen by accident).

Will include in v0.6.2 along with the japanese + mandarin improvements👍

PhilippNaused and others added 2 commits June 10, 2025 16:26

Add support for phoneme literals in the tokenizer

ba6bbda

Add release assets to tests

5d36a32

PhilippNaused marked this pull request as ready for review June 12, 2025 10:24

Lyrcaxis requested changes Jun 25, 2025

View reviewed changes

KokoroSharp/Processing/Tokenizer.cs Show resolved Hide resolved

Lyrcaxis self-requested a review July 18, 2025 12:57

Lyrcaxis approved these changes Jul 22, 2025

View reviewed changes

Lyrcaxis merged commit c2ce4ba into Lyrcaxis:main Jul 22, 2025
1 check passed

PhilippNaused deleted the phonemes-literal branch July 22, 2025 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for phoneme literals in the tokenizer#36

Add support for phoneme literals in the tokenizer#36
Lyrcaxis merged 2 commits intoLyrcaxis:mainfrom
PhilippNaused:phonemes-literal

PhilippNaused commented Jun 10, 2025

Uh oh!

Lyrcaxis left a comment

Uh oh!

Uh oh!

Lyrcaxis left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PhilippNaused commented Jun 10, 2025

Uh oh!

Lyrcaxis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Lyrcaxis left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Lyrcaxis left a comment •

edited

Loading