Automatic Speech Recognition
Transformers
Safetensors
VibeVoice
ASR
Transcriptoin
Diarization
Speech-to-Text
Instructions to use microsoft/VibeVoice-ASR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/VibeVoice-ASR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="microsoft/VibeVoice-ASR")# Load model directly from transformers import VibeVoiceForASRTraining model = VibeVoiceForASRTraining.from_pretrained("microsoft/VibeVoice-ASR", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Speaker diarization - which layers are responsible?
#26
by deathknight0 - opened
Just wondering which layers are responsible for speaker diarization in the model? I want to fine tune it for domain specific vocab only, and would like to leave the layers responsible for speaker diarization alone . All my samples are single speaker and I do not want to introduce catastrophic forgetting for diarization tasks.
Thanks in advance!