Skip to content

Deutsch 'ur' phonemizing results '??' #208

@HyperBlaze456

Description

@HyperBlaze456

Describe the bug
While transcribing deutsch text containing 'ur', for example 'kurz', unknown character ?? results from phonemizer w/ espeak-ng backend.

Phonemizer version

phonemizer-3.3.0
available backends: espeak-ng-1.50, segments-2.3.0
uninstalled backends: espeak-mbrola, festival

System
Your OS (Linux distribution, Windows, ...), eventually Python version.
Windows WSL Ubuntu 22.04, Python 3.10.12
Linux Ubuntu 22.04, Python 3.12.10

To reproduce
Transcribed the text:
Aber beginnen wir von vorne, ganz kurz, äh, die Wahlen, der Kan-, die Wahl der Kandidaten oder, äh,...

with espeak backend:

        phonemes = phonemize(
            text,
            language=espeak_lang,
            backend="espeak",
            strip=True,
            preserve_punctuation=False,
            with_stress=True,
            language_switch="remove-flags",
        )

Resulted
ˌɑːbɜ bəɡˈɪnən viːɾ fɔn fˈɔɾnə ɡˌants kˈ??ts ˈɛː diː vˈɑːlən dɛɾ kˈɑːn diː vˈɑːl dɛɾ kˌandiːdˈɑːtən ˌoːdɜ ˈɛː...
having ?? on transcribing word 'kurz'

Expected behavior
Should resulted
ˌɑːbɜ bəɡˈɪnən viːɾ fɔn fˈɔɾnə ɡˌants kˈurts ˈɛː diː vˈɑːlən dɛɾ kˈɑːn diː vˈɑːl dɛɾ kˌandiːdˈɑːtən ˌoːdɜ ˈɛː...
or equivalent, anything but ??

Additional context
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions