YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Models Directory 模型目录说明
Directory Structure 完整目录结构
models/
├── gguf/ # Original F32 (FP32) format - 原始FP32格式
│ ├── qwen3_assets.gguf # 1.5 GB | Assets (tts_pad, text_table, codec_embd, proj_weight, proj_bias)
│ ├── qwen3_tts_predictor.gguf # TTS prediction model
│ └── qwen3_tts_talker.gguf # TTS voice synthesis model
│
├── gguf_q8_0/ # Q8_0 (8-bit) quantization - 8位量化
│ ├── qwen3_assets.gguf # 388 MB | 74% compression | Nearly lossless
│ ├── qwen3_tts_predictor.gguf
│ └── qwen3_tts_talker.gguf
│
├── gguf_q5_k_m/ # Q5_K_M (5-bit K-quant) - 5位K量化
│ ├── qwen3_assets.gguf # 388 MB | Uses Q8_0 for compatibility
│ ├── qwen3_tts_predict.gguf # 98 MB
│ └── qwen3_tts_talker.gguf # 960 MB
│
├── onnx/ # Original ONNX models - 原始ONNX模型
│ ├── qwen3_tts_codec_encoder.onnx # 216 MB | Audio codec encoder
│ ├── qwen3_tts_decoder.onnx # 436 MB | Audio decoder
│ └── qwen3_tts_speaker_encoder.onnx # 46 MB | Speaker embedding encoder
│
├── onnx_int8/ # INT8 quantized ONNX - INT8量化ONNX
│ ├── qwen3_tts_codec_encoder.onnx # 104 MB
│ ├── qwen3_tts_decoder.onnx # 210 MB
│ └── qwen3_tts_speaker_encoder.onnx # 12 MB
│
├── preset_speakers/ # Preset speaker embeddings - 预设说话人嵌入
│ ├── index.json # Speaker index - 说话人索引
│ ├── aiden.json # 34 KB
│ ├── dylan.json # 34 KB
│ ├── eric.json # 34 KB
│ ├── ono_anna.json # 34 KB
│ ├── ryan.json # 34 KB
│ ├── serena.json # 34 KB
│ ├── sohee.json # 34 KB
│ ├── uncle_fu.json # 34 KB
│ └── vivian.json # 34 KB
│
└── tokenizer/ # Tokenizer - 分词器
└── tokenizer.json # 11 MB | BPE tokenizer vocabulary
Model Components 模型组件
| Component 组件 | File 文件 | Description 描述 |
|---|---|---|
| Assets 资源 | qwen3_assets.gguf |
Text embeddings, codec embeddings, projection weights |
| Predictor 预测器 | qwen3_tts_predict*.gguf |
Duration/prosody prediction model |
| Talker 合成器 | qwen3_tts_talker*.gguf |
Audio synthesis neural codec |
| Encoder 编码器 | qwen3_tts_codec_encoder.onnx |
Text → Acoustic tokens |
| Decoder 解码器 | qwen3_tts_decoder.onnx |
Acoustic tokens → Audio waveform |
| Speaker Encoder | qwen3_tts_speaker_encoder.onnx |
Reference audio → Speaker embedding |
| Tokenizer 分词器 | tokenizer.json |
Text tokenization (BPE) |
Quantization Comparison 量化对比
| Format 格式 | Bits | Size 大小 | Compression 压缩率 | Quality 质量 |
|---|---|---|---|---|
| F32 / ONNX | 32 | ~1.5 GB | - | Original 原始 |
| Q8_0 / INT8 | 8 | ~388 MB | ~74% | Nearly lossless 几乎无损 |
| Q5_K_M | 5 | ~205 MB* | ~86% | Good balance 良好平衡 |
*Q5_K_M assets uses Q8_0 for compatibility
Preset Speakers 预设说话人
Available speakers: aiden, dylan, eric, ono_anna, ryan, serena, sohee, uncle_fu, vivian
Usage example:
{
"speaker": "serena",
"preset_speaker_path": "models/preset_speakers/serena.json"
}
Usage 用法
In config.json:
{
"model_dir": "models/gguf_q8_0",
"assets": "qwen3_assets.gguf",
"tokenizer_path": "models/tokenizer/tokenizer.json",
"preset_speakers_dir": "models/preset_speakers"
}
Recommended 推荐
- Best quality 最佳质量:
gguf/+onnx/ - Recommended 推荐:
gguf_q8_0/+onnx_int8/(best balance) - Smallest size:
gguf_q5_k_m/+onnx_int8/
- Downloads last month
- 744
Hardware compatibility
Log In to add your hardware
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support