Canary 180M Flash ONNX

Converted ONNX package of nvidia/canary-180m-flash for use with @asrjs/speech-recognition.

This repository is not the original NVIDIA training checkpoint repo. It contains exported runtime artifacts for browser and Node.js inference.

Included Artifacts

  • encoder-model.onnx
  • decoder-model.onnx
  • encoder-model.fp16.onnx
  • decoder-model.fp16.onnx
  • encoder-model.int8.onnx
  • decoder-model.int8.onnx
  • tokenizer.json
  • config.json

Model Summary

Canary is an encoder-decoder speech model with:

  • FastConformer encoder
  • Transformer decoder
  • aggregate multilingual tokenizer
  • prompt/task tokens controlling language, task behavior, timestamps, and punctuation/capitalization

canary-180m-flash includes:

  • 17 encoder layers
  • 4 decoder layers
  • about 182M parameters

Supported languages:

  • English
  • German
  • Spanish
  • French

Supported tasks:

  • ASR
  • speech translation

Frontend / Preprocessing

The upstream model expects raw 16 kHz mono audio and uses a NeMo mel frontend internally.

For @asrjs/speech-recognition, this ONNX package is intended to run with the shared in-repo JavaScript NeMo mel frontend by default, so a dedicated ONNX preprocessor is not required for normal runtime usage.

Frontend assumptions:

  • sample rate: 16000
  • mono audio
  • mel bins: 128

Quantization Notes

Included variants:

  • FP32
  • FP16
  • INT8 encoder
  • INT8 decoder

Port validation summary on the smoke fixture:

  • FP32: exact token/text parity
  • FP16: exact token/text parity
  • encoder-only INT8: exact token/text parity
  • decoder-only INT8: exact token/text parity
  • full int8/int8: not exact, so it should not be treated as the default pairing

Usage with @asrjs/speech-recognition

Preset usage

import { createSpeechPipeline, PcmAudioBuffer } from '@asrjs/speech-recognition';

const pipeline = createSpeechPipeline({ cacheModels: true });

const loaded = await pipeline.loadModel({
  preset: 'canary',
  modelId: 'nvidia/canary-180m-flash',
  backend: 'wasm',
});

const audio = PcmAudioBuffer.fromMono(pcmFloat32, 16000);
const result = await loaded.transcribe(audio, {
  detail: 'detailed',
  responseFlavor: 'canonical+native',
});

console.log(result.canonical.text);

Direct source usage

const loaded = await pipeline.loadModel({
  family: 'nemo-aed',
  modelId: 'nvidia/canary-180m-flash',
  backend: 'wasm',
  options: {
    source: {
      kind: 'huggingface',
      repoId: 'ysdede/canary-180m-flash-onnx',
      preprocessorBackend: 'js',
      encoderQuant: 'fp32',
      decoderQuant: 'fp32',
    },
  },
});

Upstream Model and License

Original model:

This converted package follows the upstream CC-BY-4.0 license terms.

References

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ysdede/canary-180m-flash-onnx

Quantized
(2)
this model

Papers for ysdede/canary-180m-flash-onnx