Instructions to use ibm-granite/granite-speech-4.1-2b-plus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ibm-granite/granite-speech-4.1-2b-plus with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="ibm-granite/granite-speech-4.1-2b-plus")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("ibm-granite/granite-speech-4.1-2b-plus") model = AutoModelForSpeechSeq2Seq.from_pretrained("ibm-granite/granite-speech-4.1-2b-plus") - Notebooks
- Google Colab
- Kaggle
Speaker embeddings
is there any way to get speaker embeddings to match speakers to known speakers of previous transcripts?
I guess probably not, but wanted to ask to make sure :)
thank you for the model and sorry for the questions !
Hi,
The model isn't able to produce speaker embedding. The speaker numbers are based only on the order of appearance.
One possible solution to maintain speakers IDs between segments is to concatenate audio segments of known speakers before the segment to be decoded.
Hey, Thank you for your answer, i will maybe try to test it out if i get timestamp segments mapped to speakers and then generate speaker embeddings with pyannote or similar for it , to check if it can replace my current pipeline (whisper/canary + pyannote with a check afterwards for known speaker embeddings to have names in the transcript)