-
Notifications
You must be signed in to change notification settings - Fork 196
Description
Describe the bug
While transcribing deutsch text containing 'ur', for example 'kurz', unknown character ?? results from phonemizer w/ espeak-ng backend.
Phonemizer version
phonemizer-3.3.0
available backends: espeak-ng-1.50, segments-2.3.0
uninstalled backends: espeak-mbrola, festival
System
Your OS (Linux distribution, Windows, ...), eventually Python version.
Windows WSL Ubuntu 22.04, Python 3.10.12
Linux Ubuntu 22.04, Python 3.12.10
To reproduce
Transcribed the text:
Aber beginnen wir von vorne, ganz kurz, äh, die Wahlen, der Kan-, die Wahl der Kandidaten oder, äh,...
with espeak backend:
phonemes = phonemize(
text,
language=espeak_lang,
backend="espeak",
strip=True,
preserve_punctuation=False,
with_stress=True,
language_switch="remove-flags",
)Resulted
ˌɑːbɜ bəɡˈɪnən viːɾ fɔn fˈɔɾnə ɡˌants kˈ??ts ˈɛː diː vˈɑːlən dɛɾ kˈɑːn diː vˈɑːl dɛɾ kˌandiːdˈɑːtən ˌoːdɜ ˈɛː...
having ?? on transcribing word 'kurz'
Expected behavior
Should resulted
ˌɑːbɜ bəɡˈɪnən viːɾ fɔn fˈɔɾnə ɡˌants kˈurts ˈɛː diː vˈɑːlən dɛɾ kˈɑːn diː vˈɑːl dɛɾ kˌandiːdˈɑːtən ˌoːdɜ ˈɛː...
or equivalent, anything but ??
Additional context
N/A