Good Audio Generation space, model, dataset
Good Audio Generation space, model, dataset collection
-
Audio-to-Audio • Updated • 92.4k • 103 -
KittenML/kitten-tts-nano-0.1
Updated • 35.6k • 510 -
FunAudioLLM/ThinkSound
Video-to-Video • Updated • 52 -
ThinkSound
🔊318Generate audio for a silent video using text prompts
-
Higgs Audio Demo
🎤399Higgs Audio Demo
-
bosonai/higgs-audio-v2-generation-3B-base
Text-to-Speech • 6B • Updated • 466k • 669 -
Song Generation
🎵708Generate a song from your lyrics and description
-
Vui
🏢185NotebookLM conversational speech model
-
Hibiki Samples
🤗53Translate speech in real-time with high fidelity
-
kyutai/moshiko-pytorch-bf16
Updated • 143k • 238 -
kyutai/mimi
Feature Extraction • 96.2M • Updated • 918k • • 297 -
maya-research/Veena
Text-to-Speech • Updated • 3.44k • 230 -
MiniMax Speech Tech Report
🎙104Generate high-quality speech from text with voice cloning
-
google/magenta-realtime
Updated • 263 • 546 -
PlayDiffusion
🎨119Generate modified audio from text and voice
-
Qwen2.5 Omni 7B Demo
🏆371Chat with AI using text, audio, images, and video
-
Open ASR Leaderboard
🏆1.31kExplore speech recognition model benchmarks and rankings
-
Open NotebookLM
🎙143Generate a podcast to discuss the topic of your choice!
-
Voila Demo
💻44Chat with a voice-clone AI
-
Voice Clone
🗣2.63kGenerate speech in a cloned voice from reference audio
-
moonshotai/Kimi-Audio-7B-Instruct
Text-to-Speech • 10B • Updated • 41.8k • 391 -
moonshotai/Kimi-Audio-7B
Text-to-Speech • 10B • Updated • 91 • 78 -
Dia 1.6B
👯1.76kGenerate realistic dialogue from a script, using Dia!
-
nari-labs/Dia-1.6B
Text-to-Speech • Updated • 86.9k • • 2.84k -
ByteDance/MegaTTS3
Text-to-Speech • Updated • 119 • 417 -
Di♪♪Rhythm
🎶687Blazingly Fast and Embarrassingly Simple Song Generation
-
Gemini Audio Video
♊35Gemini understands audio and video!
-
nvidia/diar_sortformer_4spk-v1
Automatic Speech Recognition • 0.1B • Updated • 5.37k • 137 -
ACE Step
😻659A Step Towards Music Generation Foundation Model
-
ACE-Step/ACE-Step-v1-3.5B
Text-to-Audio • Updated • 727 -
stepfun-ai/Step-Audio-2-mini
Any-to-Any • Updated • 2.26k • 254 -
neuphonic/neutts-air
Text-to-Speech • 0.7B • Updated • 7.53k • 867 -
NeuTTS-Air
☁316Generate speech that mimics a reference voice
-
KaniTTS
😻114Generate expressive speech from your text in seconds
-
microsoft/UserLM-8b
Text Generation • 8B • Updated • 1.58k • 365 -
pipecat-ai/smart-turn-v3
Voice Activity Detection • Updated • 141 -
meituan-longcat/LongCat-Audio-Codec
Updated • 41 -
Qwen3 TTS Voice Design
📈110Generate custom voices from text using natural language prompts
-
Qwen TTS Clone Demo
👀64Create a custom voice clone and synthesize speech
-
ResembleAI/chatterbox-turbo
Text-to-Speech • Updated • 639 -
Chatterbox Turbo Demo
⚡492Chatterbox Turbo Demo
-
zai-org/GLM-TTS
Text-to-Speech • Updated • 2.3k • 334 -
Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
Text-to-Speech • 2B • Updated • 1.47M • 1.41k -
Qwen3-TTS Demo
🎙1.87kGenerate speech from text with custom voice, cloning, or presets
-
Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice
Text-to-Speech • 0.9B • Updated • 237k • 139 -
FlashLabs/Chroma-4B
Any-to-Any • Updated • 812 • 345 -
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning
Paper • 2601.11141 • Published • 23 -
MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization
Paper • 2601.01554 • Published • 59 -
FunAudioLLM/Fun-Audio-Chat-8B
Any-to-Any • 9B • Updated • 3.47k • 183 -
OpenMOSS-Team/MOSS-TTS-Nano-100M
Text-to-Speech • Updated • 22.7k • 120