meta-llama/Llama-3.2-11B-Vision-Instruct Image-Text-to-Text • 11B • Updated Dec 4, 2024 • 165k • • 1.56k
MahmoudAshraf/mms-300m-1130-forced-aligner Automatic Speech Recognition • 0.3B • Updated Sep 28, 2024 • 2.77M • 74
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated Dec 10, 2025 • 284k • 1.57k