---
license: apache-2.0
language:
  - en
  - zh
  - ja
  - ko
  - vi
  - th
  - id
  - ms
  - hi
  - ar
  - tr
  - ru
  - de
  - fr
  - es
  - multilingual
tags:
  - speech-recognition
  - asr
  - coreml
  - apple
  - ios
  - macos
  - qwen
  - audio
library_name: coreml
pipeline_tag: automatic-speech-recognition
base_model: Qwen/Qwen3-ASR-0.6B
---

# Qwen3-ASR 0.6B CoreML

Core ML conversion of [Qwen/Qwen3-ASR-0.6B](https://huggingface.co/Qwen/Qwen3-ASR-0.6B) for
on-device speech recognition on Apple platforms (iOS/macOS).

## Model Variants

| Variant | Size | Description |
|---------|------|-------------|
| `f32/` | ~2.5 GB | Full precision (Float32) - highest accuracy |
| `int8/` | ~0.7 GB | Quantized (Int8) - smaller, faster |

## Features

- **30+ languages** including English, Chinese, Japanese, Korean, and more
- **On-device inference** - no internet required
- **Autoregressive decoder** with KV-cache support
- Processes audio in 1-second chunks (100 mel frames)

## Benchmarks (M4 Pro)

| Dataset | WER | CER | RTFx |
|---------|-----|-----|------|
| LibriSpeech test-clean (2620 files) | 4.4% | 1.9% | 2.8x |
| AISHELL-1 test (100 files) | 4.6% | 3.7% | 4.5x |

*Official PyTorch model: 2.11% WER on LibriSpeech test-clean*

## Usage with FluidAudio

```
import FluidAudio

let manager = Qwen3AsrManager()
try await manager.loadModels()

let samples = try AudioConverter().resampleAudioFile(path: "audio.wav")
let transcript = try await manager.transcribe(
    audioSamples: samples,
    language: "en",
    maxNewTokens: 512
)
print(transcript)
```


Model Architecture

- Encoder: Audio encoder (Whisper-style mel spectrogram input)
- Decoder: 28-layer transformer decoder with 1024 hidden size
- Tokenizer: Qwen tokenizer with special ASR tokens


License

Apache 2.0 - Same as the original Qwen3-ASR model.

Credits

- Original model: https://huggingface.co/Qwen/Qwen3-ASR-0.6B by Alibaba Qwen Team
- Paper: https://arxiv.org/abs/2601.21337
- CoreML conversion: https://github.com/FluidInference/FluidAudio

Citation

@article{qwen3asr,
  title={Qwen3-ASR Technical Report},
  author={Qwen Team},
  journal={arXiv preprint arXiv:2601.21337},
  year={2025}
}
  
For the HuggingFace metadata UI, fill in:
- **License**: Apache 2.0
- **Base model**: Qwen/Qwen3-ASR-0.6B
- **Pipeline**: automatic-speech-recognition
- **Library**: coreml
- **Languages**: en, zh, ja, ko, + others