Canary 180M Flash ONNX

Converted ONNX package of nvidia/canary-180m-flash for use with @asrjs/speech-recognition.

This repository is not the original NVIDIA training checkpoint repo. It contains exported runtime artifacts for browser and Node.js inference.

Included Artifacts

encoder-model.onnx
decoder-model.onnx
encoder-model.fp16.onnx
decoder-model.fp16.onnx
encoder-model.int8.onnx
decoder-model.int8.onnx
tokenizer.json
config.json

Model Summary

Canary is an encoder-decoder speech model with:

FastConformer encoder
Transformer decoder
aggregate multilingual tokenizer
prompt/task tokens controlling language, task behavior, timestamps, and punctuation/capitalization

canary-180m-flash includes:

17 encoder layers
4 decoder layers
about 182M parameters

Supported languages:

English
German
Spanish
French

Supported tasks:

ASR
speech translation

Frontend / Preprocessing

The upstream model expects raw 16 kHz mono audio and uses a NeMo mel frontend internally.

For @asrjs/speech-recognition, this ONNX package is intended to run with the shared in-repo JavaScript NeMo mel frontend by default, so a dedicated ONNX preprocessor is not required for normal runtime usage.

Frontend assumptions:

sample rate: 16000
mono audio
mel bins: 128

Quantization Notes

Included variants:

FP32
FP16
INT8 encoder
INT8 decoder

Port validation summary on the smoke fixture:

FP32: exact token/text parity
FP16: exact token/text parity
encoder-only INT8: exact token/text parity
decoder-only INT8: exact token/text parity
full int8/int8: not exact, so it should not be treated as the default pairing

Usage with `@asrjs/speech-recognition`

Preset usage

import { createSpeechPipeline, PcmAudioBuffer } from '@asrjs/speech-recognition';

const pipeline = createSpeechPipeline({ cacheModels: true });

const loaded = await pipeline.loadModel({
  preset: 'canary',
  modelId: 'nvidia/canary-180m-flash',
  backend: 'wasm',
});

const audio = PcmAudioBuffer.fromMono(pcmFloat32, 16000);
const result = await loaded.transcribe(audio, {
  detail: 'detailed',
  responseFlavor: 'canonical+native',
});

console.log(result.canonical.text);

Direct source usage

const loaded = await pipeline.loadModel({
  family: 'nemo-aed',
  modelId: 'nvidia/canary-180m-flash',
  backend: 'wasm',
  options: {
    source: {
      kind: 'huggingface',
      repoId: 'ysdede/canary-180m-flash-onnx',
      preprocessorBackend: 'js',
      encoderQuant: 'fp32',
      decoderQuant: 'fp32',
    },
  },
});

Upstream Model and License

Original model:

nvidia/canary-180m-flash

This converted package follows the upstream CC-BY-4.0 license terms.

References

Downloads last month: 10

Model tree for ysdede/canary-180m-flash-onnx

Base model

nvidia/canary-180m-flash

Quantized

(2)

this model

Papers for ysdede/canary-180m-flash-onnx

Training and Inference Efficiency of Encoder-Decoder Speech Models

Paper • 2503.05931 • Published Mar 7, 2025 • 5

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120