YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Models Directory 模型目录说明

Directory Structure 完整目录结构

models/
├── gguf/                    # Original F32 (FP32) format - 原始FP32格式
│   ├── qwen3_assets.gguf   # 1.5 GB | Assets (tts_pad, text_table, codec_embd, proj_weight, proj_bias)
│   ├── qwen3_tts_predictor.gguf  # TTS prediction model
│   └── qwen3_tts_talker.gguf     # TTS voice synthesis model
│
├── gguf_q8_0/               # Q8_0 (8-bit) quantization - 8位量化
│   ├── qwen3_assets.gguf   # 388 MB | 74% compression | Nearly lossless
│   ├── qwen3_tts_predictor.gguf
│   └── qwen3_tts_talker.gguf
│
├── gguf_q5_k_m/             # Q5_K_M (5-bit K-quant) - 5位K量化
│   ├── qwen3_assets.gguf   # 388 MB | Uses Q8_0 for compatibility
│   ├── qwen3_tts_predict.gguf  # 98 MB
│   └── qwen3_tts_talker.gguf  # 960 MB
│
├── onnx/                    # Original ONNX models - 原始ONNX模型
│   ├── qwen3_tts_codec_encoder.onnx   # 216 MB | Audio codec encoder
│   ├── qwen3_tts_decoder.onnx         # 436 MB | Audio decoder
│   └── qwen3_tts_speaker_encoder.onnx # 46 MB  | Speaker embedding encoder
│
├── onnx_int8/               # INT8 quantized ONNX - INT8量化ONNX
│   ├── qwen3_tts_codec_encoder.onnx   # 104 MB
│   ├── qwen3_tts_decoder.onnx         # 210 MB
│   └── qwen3_tts_speaker_encoder.onnx # 12 MB
│
├── preset_speakers/         # Preset speaker embeddings - 预设说话人嵌入
│   ├── index.json           # Speaker index - 说话人索引
│   ├── aiden.json           # 34 KB
│   ├── dylan.json           # 34 KB
│   ├── eric.json            # 34 KB
│   ├── ono_anna.json        # 34 KB
│   ├── ryan.json            # 34 KB
│   ├── serena.json          # 34 KB
│   ├── sohee.json           # 34 KB
│   ├── uncle_fu.json        # 34 KB
│   └── vivian.json          # 34 KB
│
└── tokenizer/               # Tokenizer - 分词器
    └── tokenizer.json       # 11 MB | BPE tokenizer vocabulary

Model Components 模型组件

Component 组件 File 文件 Description 描述
Assets 资源 qwen3_assets.gguf Text embeddings, codec embeddings, projection weights
Predictor 预测器 qwen3_tts_predict*.gguf Duration/prosody prediction model
Talker 合成器 qwen3_tts_talker*.gguf Audio synthesis neural codec
Encoder 编码器 qwen3_tts_codec_encoder.onnx Text → Acoustic tokens
Decoder 解码器 qwen3_tts_decoder.onnx Acoustic tokens → Audio waveform
Speaker Encoder qwen3_tts_speaker_encoder.onnx Reference audio → Speaker embedding
Tokenizer 分词器 tokenizer.json Text tokenization (BPE)

Quantization Comparison 量化对比

Format 格式 Bits Size 大小 Compression 压缩率 Quality 质量
F32 / ONNX 32 ~1.5 GB - Original 原始
Q8_0 / INT8 8 ~388 MB ~74% Nearly lossless 几乎无损
Q5_K_M 5 ~205 MB* ~86% Good balance 良好平衡

*Q5_K_M assets uses Q8_0 for compatibility

Preset Speakers 预设说话人

Available speakers: aiden, dylan, eric, ono_anna, ryan, serena, sohee, uncle_fu, vivian

Usage example:

{
  "speaker": "serena",
  "preset_speaker_path": "models/preset_speakers/serena.json"
}

Usage 用法

In config.json:

{
  "model_dir": "models/gguf_q8_0",
  "assets": "qwen3_assets.gguf",
  "tokenizer_path": "models/tokenizer/tokenizer.json",
  "preset_speakers_dir": "models/preset_speakers"
}

Recommended 推荐

  • Best quality 最佳质量: gguf/ + onnx/
  • Recommended 推荐: gguf_q8_0/ + onnx_int8/ (best balance)
  • Smallest size: gguf_q5_k_m/ + onnx_int8/
Downloads last month
744
GGUF
Model size
0.4B params
Architecture
qwen3-tts-assets
Hardware compatibility
Log In to add your hardware

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support