-
-
Notifications
You must be signed in to change notification settings - Fork 601
Description
Hi there,
tl;dr: click on opire link to add funding/donation for german language.
I) Motivation for german language in kokoro
I.1) Kokoro is best quality openweight TTS
See, e.g. the leaderboards:
- https://artificialanalysis.ai/text-to-speech/leaderboard?open-weights=true
- https://huggingface.co/spaces/TTS-AGI/TTS-Arena
- https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2
I.2) Kokoro is efficient (and fast, CPU only possible)
It can be run on CPU using streaming. Most of the even lesser openweight competitors use much more resources or may need a graphics card to even run the ai models (while still beeing less good).
The current 10 top openweight tts on artificialanalysis are not suitable (see comparison table, using ai searches).
Comparison table
| # | Model | Typical hardware requirements (inference) | CPU real‑time viability* | German support (out‑of‑the‑box)** |
|---|---|---|---|---|
| 1 | Kokoro 82M v1.0 | Very small (~tens of M params). Runs on CPU or small GPU (≈2–4 GB VRAM). | High – real‑time or near real‑time on a modern CPU. | Unclear / probably no official DE voice in base models. |
| 2 | Fish Speech 1.5 | Medium–large; GPU strongly recommended (≈10–12 GB VRAM for smooth use). | Low – CPU works but far from real‑time. | No clear German support; mainly EN/zh focus. |
| 3 | Maya1 | Large multi‑billion‑param model; needs high‑end GPU (≥16 GB VRAM). | Very low – practical real‑time on CPU not expected. | Unclear – no solid public info about German at cutoff. |
| 4 | Chatterbox | Several variants; Multilingual / Turbo ≈few‑hundred‑M params; GPU (≈8 GB) recommended, CPU possible. | Medium – Turbo/distilled variant can approach RT on a strong CPU; full multilingual is slower. | Yes – multilingual checkpoints explicitly include German. |
| 5 | Zonos‑v0.1 | Medium–large; designed for GPU inference (≈8 GB+ VRAM recommended). | Very low – CPU inference is slow, not interactive. | No – appears English‑only in public releases. |
| 6 | VibeVoice 7B | Very large (7B). Requires powerful GPU(s), typically ≥16–24 GB VRAM. | Effectively none – CPU real‑time is not realistic. | Likely multilingual, but exact DE status not confirmed. |
| 7 | OpenVoice v2 | Medium; CPU possible but GPU (≈6–8 GB+ VRAM) recommended for cloning speed / latency. | Low–medium – typical CPUs are slower than real‑time; strong CPU + heavy tuning might get close. | Unclear – paper stresses EN/zh/ja; DE status uncertain. |
| 8 | Step TTS Mini | Small / edge‑oriented; runs well on CPU or very small GPU (≈2–4 GB VRAM). | High – explicitly designed to run fast on CPU. | No official German – docs only confirm zh + EN. |
| 9 | XTTS v2 (Coqui) | Medium; GPU with ≥8 GB VRAM recommended for smooth multilingual / cloning use. | Low–medium – CPU possible but usually >RT latency. | Yes – XTTS v2 is explicitly multilingual incl. German. |
| 10 | StyleTTS 2 | Medium–large; practical use assumes GPU with ≈8–12 GB VRAM. | Very low – CPU inference is slow, not real‑time. | Base models EN‑only; DE only via community fine‑tunes. |
I.3) There is high demand/ many requests
a) See how active that threads have been:
#232
#186
b) Also german is one of the language in europa with many speakers like 95 million L1 and 85 million L2.
c) Also you can see in the table, there are no TTS alternatives.
II) Funding
II.1) Funding movitation
This issue is about funding the german language. There have been many comments, but all in all it probably didn't help much @hexgrad to add the german language, because he has some specific needs for the training data. The common voice isn't e.g. viable as I read somewhere by him. Having the effort of grabbing all the data together is not efficient for him.
So, it's probably most straight forward to fund him the money to produce the german training data like he needs it (using other TTS) and train the model.
II.2) Funding method
I considered polar.sh, algora and opire (had some options listed by ai). I found opire most straightforward to start, so I'm using that. But feel free to also add other ones to this thread. Also correct me, if something is not correctly configured.
II.3) Donation/Funding
Let's go. And see how much we can get together, to support hexgrad in adding german:
https://app.opire.dev/issues/01KCKZCP14P37WZQZQA9W6BGJJ
You login with github and pay with PayPal, VISA, SEPA using Stripe. Don't make an Stripe Express account via option link as that is for business people. Use a normal account https://dashboard.stripe.com/register?locale=de-DE .
I'm currently trying to figure out it myself (using Opire for the first time), so you might want (and I recommend that) to subscribe this thread and wait before i figured it out. I write a message, after i got the payment mechanism to work (at first, I wrongly created a stripe express acount, so i need to try the normal stripe account myself after i deleted the express account, which needs contacting).