Automatic Speech Recognition
Transformers
PyTorch
JAX
Safetensors
whisper
audio
hf-asr-leaderboard
Eval Results
Instructions to use openai/whisper-large-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/whisper-large-v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("openai/whisper-large-v3") model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v3") - Inference
- Notebooks
- Google Colab
- Kaggle
Upload tokenizer.json
#16
by jonatanklosko - opened
Generated with:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("openai/whisper-large-v3")
assert tokenizer.is_fast
tokenizer.save_pretrained("...")
As discussed with @ArthurZ on the PR the fast tokenizer can always be loaded from the slow one: https://github.com/huggingface/transformers/pull/27338/files#r1384935617
So there's no issue with not having the tokenizer.json. Happy to merge this PR to improve clarity for the Hub weights however
@sanchit-gandhi yeah, the thing is that the Rust huggingface/tokenizers can only load tokenizer.json. In the Elixir ecosystem we have bindings to huggingface/tokenizers and so rely solely on fast tokenizers :)
patrickvonplaten changed pull request status to merged