Canary 180M Flash ONNX
Converted ONNX package of nvidia/canary-180m-flash for use with @asrjs/speech-recognition.
This repository is not the original NVIDIA training checkpoint repo. It contains exported runtime artifacts for browser and Node.js inference.
Included Artifacts
encoder-model.onnxdecoder-model.onnxencoder-model.fp16.onnxdecoder-model.fp16.onnxencoder-model.int8.onnxdecoder-model.int8.onnxtokenizer.jsonconfig.json
Model Summary
Canary is an encoder-decoder speech model with:
- FastConformer encoder
- Transformer decoder
- aggregate multilingual tokenizer
- prompt/task tokens controlling language, task behavior, timestamps, and punctuation/capitalization
canary-180m-flash includes:
- 17 encoder layers
- 4 decoder layers
- about 182M parameters
Supported languages:
- English
- German
- Spanish
- French
Supported tasks:
- ASR
- speech translation
Frontend / Preprocessing
The upstream model expects raw 16 kHz mono audio and uses a NeMo mel frontend internally.
For @asrjs/speech-recognition, this ONNX package is intended to run with the shared in-repo JavaScript NeMo mel frontend by default, so a dedicated ONNX preprocessor is not required for normal runtime usage.
Frontend assumptions:
- sample rate:
16000 - mono audio
- mel bins:
128
Quantization Notes
Included variants:
- FP32
- FP16
- INT8 encoder
- INT8 decoder
Port validation summary on the smoke fixture:
- FP32: exact token/text parity
- FP16: exact token/text parity
- encoder-only INT8: exact token/text parity
- decoder-only INT8: exact token/text parity
- full
int8/int8: not exact, so it should not be treated as the default pairing
Usage with @asrjs/speech-recognition
Preset usage
import { createSpeechPipeline, PcmAudioBuffer } from '@asrjs/speech-recognition';
const pipeline = createSpeechPipeline({ cacheModels: true });
const loaded = await pipeline.loadModel({
preset: 'canary',
modelId: 'nvidia/canary-180m-flash',
backend: 'wasm',
});
const audio = PcmAudioBuffer.fromMono(pcmFloat32, 16000);
const result = await loaded.transcribe(audio, {
detail: 'detailed',
responseFlavor: 'canonical+native',
});
console.log(result.canonical.text);
Direct source usage
const loaded = await pipeline.loadModel({
family: 'nemo-aed',
modelId: 'nvidia/canary-180m-flash',
backend: 'wasm',
options: {
source: {
kind: 'huggingface',
repoId: 'ysdede/canary-180m-flash-onnx',
preprocessorBackend: 'js',
encoderQuant: 'fp32',
decoderQuant: 'fp32',
},
},
});
Upstream Model and License
Original model:
This converted package follows the upstream CC-BY-4.0 license terms.
References
- Downloads last month
- 10
Model tree for ysdede/canary-180m-flash-onnx
Base model
nvidia/canary-180m-flash