Update README.md
Browse files
README.md
CHANGED
|
@@ -207,7 +207,7 @@ The model is intended for users requiring speech-to-text transcription capabilit
|
|
| 207 |
|
| 208 |
### Release Date:
|
| 209 |
|
| 210 |
-
Huggingface 07/
|
| 211 |
|
| 212 |
## Model Architecture:
|
| 213 |
Canary-Qwen is a Speech-Augmented Language Model (SALM) [9] model with FastConformer [2] Encoder and Transformer Decoder [3]. It is built using two base models: `nvidia/canary-1b-flash` [1,5] and `Qwen/Qwen3-1.7B` [4], a linear projection, and low-rank adaptation (LoRA) applied to the LLM. The audio encoder computes audio representation that is mapped to the LLM embedding space via a linear projection, and concatenated with the embeddings of text tokens. The model is prompted with "Transcribe the following: <audio>", using Qwen's chat template.
|
|
|
|
| 207 |
|
| 208 |
### Release Date:
|
| 209 |
|
| 210 |
+
Huggingface 07/17/2025 via https://huggingface.co/nvidia/canary-qwen-2.5b
|
| 211 |
|
| 212 |
## Model Architecture:
|
| 213 |
Canary-Qwen is a Speech-Augmented Language Model (SALM) [9] model with FastConformer [2] Encoder and Transformer Decoder [3]. It is built using two base models: `nvidia/canary-1b-flash` [1,5] and `Qwen/Qwen3-1.7B` [4], a linear projection, and low-rank adaptation (LoRA) applied to the LLM. The audio encoder computes audio representation that is mapped to the LLM embedding space via a linear projection, and concatenated with the embeddings of text tokens. The model is prompted with "Transcribe the following: <audio>", using Qwen's chat template.
|