Kokoro German β€” HUI Multispeaker Base β€” GGUF

GGUF / ggml conversion of dida-80b/kokoro-german-hui-multispeaker-base for use with CrispStrobe/CrispASR.

This is the Stage-1 multispeaker base of Kokoro-82M re-trained on the HUI Audio Corpus German (CC0, 51 speakers, ~51 h). Predictor + decoder + StyleEncoder were all trained on German, so prosody and timing on long German phrases sound native β€” unlike using the official English-trained Kokoro-82M with a foreign-language voice fallback, which collapses to silence on long German utterances.

This is a base model, not a voice. It pairs with a German voice pack (e.g. df_victoria or dm_martin from the kikiri-tts lineage). For deployable single-speaker production quality, you can run a Stage-2 fine-tune on one HUI speaker (~half-day on an A40); the base alone gives multispeaker-conditioned output.

Training repo: semidark/kokoro-deutsch Β· Discussion: Issue #9

Files

File Quant Size Notes
kokoro-de-hui-base-f16.gguf F16 156 MB Reference quality β€” ASR roundtrip byte-perfect on the long test phrase
kokoro-de-hui-base-q8_0.gguf Q8_0 135 MB Recommended β€” ASR roundtrip identical to F16

Q4_K is not published: it loses the trailing "-izers" suffix on "Phonemizers" and degrades into garbled output ("Guten A, dies ist ein S des Worten von links."). Q8_0 is the smallest safe quant for this backbone.

Auto-routing in CrispASR

When kokoro-de-hui-base-f16.gguf (or its Q8_0 sibling) sits in the same directory as kokoro-82m-f16.gguf, CrispASR silently swaps to the German backbone whenever the user passes -l de (or any de_* / de-* locale). Voice cascade for German: df_victoria β†’ df_eva β†’ ff_siwis. Explicit --voice always wins.

./crispasr --backend kokoro \
    -m kokoro-82m-q8_0.gguf \
    -l de \
    --tts "Guten Tag, dies ist ein Test des deutschen Phonemizers." \
    --tts-output de.wav
# crispasr silently picks kokoro-de-hui-base-q8_0.gguf + kokoro-voice-df_victoria.gguf

The C ABI behind this routing is crispasr_kokoro_resolve_model_for_lang_abi and crispasr_kokoro_resolve_fallback_voice_abi (see src/kokoro.h). The Python wrapper exposes it as crispasr.kokoro_resolve_for_lang(model, lang).

Quality verification

ASR roundtrip via parakeet-tdt-0.6b-v3 -l de on "Guten Tag, dies ist ein Test des deutschen Phonemizers.":

Backbone Γ— voice Quant Parakeet output verdict
dida-80b + dm_martin F16 "Guten Tag, dies ist ein Test des deutschen Phonemizers." byte-perfect
dida-80b + dm_martin Q8_0 "Guten Tag, dies ist ein Test des deutschen Phonemizers." byte-perfect
dida-80b + df_victoria F16 "Guten Tag, dies ist ein Tester des Deutschen Phonemizers." 1 word-boundary err on "Test"
dida-80b + dm_bernd F16 "Guten Tag, dies ist ein Test des deutschen Phonemetzers." 1 phoneme err on "izers"

crispasr-diff kokoro against the PyTorch reference (after rewriting dida-80b's modern parametrize WeightNorm keys to legacy weight_g/weight_v so upstream KModel can load them):

Stage (selection) F16 cos_min Q8_0 cos_min
token_ids β†’ durations 1.000 1.000
text_enc_out 1.000 0.9999
dec_decode_3_out 0.9999 0.999
audio_out (RNG-divergent) 0.91 0.03

F16 hits 14/16 PASS at cosβ‰₯0.999 (the 2 fails are RNG-divergent decoder stages). Q8_0 has weaker audio_out cosine but ASR-roundtrips identically β€” perceptual quality is preserved even where bit-level cosine drops.

Voice packs

Use the sister repo cstr/kokoro-voices-GGUF, which bundles:

Voice Source Tier
df_victoria kikiri-tts/kikiri-german-victoria, Apache-2.0 in-distribution to dida-80b lineage; recommended German default
dm_martin kikiri-tts/kikiri-german-martin, Apache-2.0 byte-perfect ASR roundtrip on test phrase
df_eva, dm_bernd recovered via r1di/kokoro-fastapi-german Git LFS from the deleted Tundragoon repo, Apache-2.0 second-tier German fallback

The kikiri voicepacks are pre-extracted by the dida-80b maintainer using a German-trained StyleEncoder that shares lineage with this base β€” so they're in-distribution to the predictor and decoder bundled here.

Architecture

Identical to hexgrad/Kokoro-82M: same 178-symbol IPA vocab (per semidark/kokoro-deutsch/training/kokoro_symbols.py), same component shapes. Only the weight values differ β€” predictor + decoder + StyleEncoder were retrained on HUI; the BERT prosody encoder uses the same German dbmdz/bert-base-german-cased weights as the upstream kokoro-deutsch recipe.

Conversion

# dida-80b ships only a stub config.json β€” pass the official Kokoro-82M
# config so the 178-symbol IPA vocab is preserved (vocab IDs are byte-
# identical between the two by training-recipe construction).
python models/convert-kokoro-to-gguf.py \
    --input dida-80b/kokoro-german-hui-multispeaker-base \
    --output kokoro-de-hui-base-f16.gguf \
    --config /path/to/hexgrad/Kokoro-82M/config.json \
    --outtype f16

build/bin/crispasr-quantize kokoro-de-hui-base-f16.gguf kokoro-de-hui-base-q8_0.gguf q8_0

License

  • Weights: Apache-2.0 (inherited from Kokoro-82M).
  • Training corpus: HUI-Audio-Corpus-German is CC0; the gated dida-80b/hui-german-51speakers HF dataset uses the same source but is access-controlled.

Attribution

Downloads last month
167
GGUF
Model size
81.7M params
Architecture
kokoro
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cstr/kokoro-de-hui-base-GGUF

Quantized
(1)
this model