--- language: lth tags: - audio - automatic-speech-recognition - whisper license: cc-by-nc-4.0 datasets: - mozilla-foundation/common_voice_spontaneous_speech --- # Whisper Large-v3 Fine-tuned for Thur This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the Mozilla Common Voice Spontaneous Speech dataset for Thur (lth). ## Training - Base model: openai/whisper-large-v3 - Fine-tuning method: Full fine-tuning (seq2seq cross-entropy) - Whisper language token: english - Dataset: Mozilla Common Voice Spontaneous Speech ## Usage ```python from transformers import WhisperForConditionalGeneration, WhisperProcessor import torch processor = WhisperProcessor.from_pretrained("vitthalbhandari/whisper-large-v3-aft-all-lth") model = WhisperForConditionalGeneration.from_pretrained("vitthalbhandari/whisper-large-v3-aft-all-lth") inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt") with torch.no_grad(): generated_ids = model.generate(**inputs) transcription = processor.batch_decode(generated_ids, skip_special_tokens=True) ```