Safetensors
llama
uri-pareras commited on
Commit
8d98d7e
·
verified ·
1 Parent(s): a26db65

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -15,7 +15,6 @@ base_model:
15
  - BSC-LT/salamandraTA-7b-instruct
16
  ---
17
 
18
-
19
  <!-- ![](./images/salamandra_header.png) -->
20
 
21
  # SalamandraTAV-7b Model Card
@@ -59,7 +58,7 @@ The model is intended for both research and commercial use for the speech to tex
59
 
60
  ### Training Framework
61
 
62
- The code used to train SalamandraTAV-7b is based on the [Transformers](https://huggingface.co/docs/transformers/) library, and can be found in [github/speech_salamandra]().
63
 
64
  ### Compute Infrastructure
65
 
@@ -84,7 +83,6 @@ Training was conducted on 4 nodes, each with the following specifications:
84
  The easiest way to use the model is using the custom pipeline `multimodal_mt`:
85
  ```python
86
  from transformers import pipeline
87
-
88
  pipe = pipeline(
89
  task="multimodal_mt",
90
  model="langtech-veu/salamandra-TAV-7b",
@@ -122,14 +120,12 @@ translation = pipe(audio_path, mode="s2tt", src_lang="English", tgt_lang="Spanis
122
  Run the S2TT pipeline, specifying the target language:
123
  ```python
124
  transcription = pipe(audio_path, mode="asr", **generation_kwargs)
125
-
126
  translation = pipe(transcription, mode="t2tt", tgt_lang="Spanish", **generation_kwargs)
127
  ```
128
 
129
  Optionally, you can also specify the source language:
130
  ```python
131
  transcription = pipe(audio_path, mode="asr", src_lang="English", **generation_kwargs)
132
-
133
  translation = pipe(transcription, mode="t2tt", src_lang="English", tgt_lang="Spanish", **generation_kwargs)
134
  ```
135
 
@@ -140,14 +136,12 @@ This is a variant which uses a CoT mechanism to generate the translation by tran
140
  Run the S2TT pipeline, specifying the target language:
141
  ```python
142
  history = pipe(audio_path, return_chat_history=True, mode="asr", **generation_kwargs)
143
-
144
  translation = pipe(history, mode="t2tt", tgt_lang="Spanish", **generation_kwargs)
145
  ```
146
 
147
  Optionally, you can also specify the source language:
148
  ```python
149
  history = pipe(audio_path, return_chat_history=True, mode="asr", src_lang="English", **generation_kwargs)
150
-
151
  translation = pipe(history, mode="t2tt", src_lang="English", tgt_lang="Spanish", **generation_kwargs)
152
  ```
153
 
@@ -156,15 +150,23 @@ translation = pipe(history, mode="t2tt", src_lang="English", tgt_lang="Spanish",
156
  If you are interested in getting the intermediate results, you can do it as follows:
157
  ```python
158
  history = pipe(audio_path, return_chat_history=True, mode="asr", **generation_kwargs)
159
-
160
  history = pipe(history, return_chat_history=True, mode="t2tt", tgt_lang="Spanish", **generation_kwargs)
161
-
162
- print(history.get_assistant_messages())
 
163
  ```
164
 
165
 
166
  ## Training Data
167
- ![IMAGE](./images/data.png)
 
 
 
 
 
 
 
168
 
169
  ### Automatic Speech Recognition Data
170
  | Dataset | ast | ca | en | es | eu | gl | oc | pt | Total |
@@ -183,39 +185,35 @@ print(history.get_assistant_messages())
183
  | Total (hours) | 0.5h |2010.5h|6729.5h|2497.5h| 544h | 181h | 0.5h | 184h |12147.5h|
184
 
185
 
186
- ![IMAGE](./images/data.png)
187
-
188
  ### Speech-To-Text Translation
189
 
190
- For the S2TT data, we extended the Common Voice Corpus 21.0 into the translation domain by translating the original transcriptions into all target languages supported by our model, using [salamandraTA-7b-instruct](https://huggingface.co/BSC-LT/salamandraTA-7b-instruct). To ensure quality, we filtered the synthesized samples using a [BLASER 2.0](https://huggingface.co/facebook/blaser-2.0-qe) score threshold of 3.75 and a [GlotLID v3](https://github.com/cisnlp/GlotLID) target language probability of at least 50%. This dataset will be released soon.
191
- > [!NOTE] Note that source audios are shared over all the target languages.
192
-
193
- | Common Voice Corpus 21.0 - SynthS2TT | ast | ca | en | es | eu | gl | oc | pt | Total (tgt) |
194
- |:-------------------------------------|:------|:------|:------|:------|:------|:------|:------|:------|:------|
195
- | **ast** | - | 13min | 19min | 16min | 7min | 12min | 9min | 17min | |
196
- | **ca** | | - | | | | | | | |
197
- | **en** | | | - | | | | | | |
198
- | **es** | | | | - | | | | | |
199
- | **eu** | | | | | - | | | | |
200
- | **gl** | | | | | | - | | | |
201
- | **oc** | | | | | | | - | | |
202
- | **pt** | | | | | | | | - | |
203
- | **Total (src)** | | | | | | | | | - |
204
-
205
- | Other Datasets | ca-en | en-ca | en-es | en-pt | es-en | es-pt | pt-en | pt-es | Total |
206
  |:----------------------------------------------------------------------------------------------------|:------|:------|:------|:------|:------|:------|:------|:------|:------|
207
  | [CoVoST 2](https://github.com/facebookresearch/covost/) (train) | 135.5h| 430h | | | 113h | | 10.5h | | 689h |
208
  | [Europarl-ST v1.1](https://www.mllp.upv.es/europarl-st/) (train) | | | 75.5h | 74h | 20.5h | 12.5h | 14.5h | 9.5h | 207h |
209
- | Total (hours) | | | | | | | | | |
210
-
211
 
212
- ![IMAGE](./images/data.png)
213
 
214
  ### Text-To-Text Translation
215
 
216
- - None yet
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
217
 
218
- ![IMAGE](./images/data.png)
219
 
220
 
221
 
@@ -272,23 +270,17 @@ transformation = jiwer.Compose([
272
 
273
  | SalamandraTAV | ast | ca | en | es | gl | pt | Avg (tgt) |
274
  |:--------------|-----------:|----------:|----------:|----------:|-----------:|-------------:|----------:|
275
- | **ast** | - | **26.5** | **27.2** | **17.3** | 13.5 | **22.2** | **21.3** |
276
- | **ca** | **21.2** | - | 39.8 | **23.3** | **28.4** | **30.5** | **28.6** |
277
- | **en** | **22.3** | **38.6** | - | **25.0** | **32.0** | 38.8 | 31.3 |
278
- | **es** | **13.3** | **23.1** | 25.1 | - | **18.2** | **21.1** | **20.2** |
279
- | **gl** | **17.4** | **28.3** | 32.2 | **20.9** | - | **24.1** | 24.6 |
280
- | **pt** | **18.8** | **32.3** | 36.1 | **21.3** | **22.0** | - | **26.1** |
281
- | **Avg (src)** | **18.6** | **29.8** | 32.1 | **21.6** | **22.8** | **27.3** | **25.4** |
282
-
283
- | SeamlessM4T | ast | ca | en | es | gl | pt | Avg (tgt) |
284
- |:--------------|-----------:|----------:|----------:|----------:|-----------:|-------------:|------:|
285
- | **ast** | - | 17.6 | 26.4 | 12.5 | **15.8** | 14.8 | 17.4 |
286
- | **ca** | - | - | **40.0** | 20.9 | 26.1 | 27.0 | 28.5 |
287
- | **en** | - | 36.7 | - | 23.7 | 29.7 | **43.0** | **33.3** |
288
- | **es** | - | 16.8 | **25.4** | - | 15.3 | 14.4 | 18.0 |
289
- | **gl** | - | 24.0 | **34.6** | 19.4 | - | 23.0 | **25.3** |
290
- | **pt** | - | 19.3 | **38.4** | 13.3 | 15.8 | - | 21.7 |
291
- | **Avg (src)** | - | 22.9 | **33.0** | 18.0 | 20.5 | 24.4 | 23.8 |
292
 
293
  The evaluation results reported here were obtained by manually running inference with the model on the test set. We verified that the results from SeamlessM4T are consistent with the [official results](https://huggingface.co/facebook/seamless-m4t-v2-large) reported by the authors.
294
 
@@ -296,23 +288,18 @@ The evaluation results reported here were obtained by manually running inference
296
  **XCOMET-XL**
297
 
298
 
299
- | SalamandraTAV | ca | en | es | gl | pt | Avg (tgt) |
300
- |:---------------|-----------:|--------:|------------:|------------:|------------:|-----------:|
301
- | **ca** | - | 0.9291 | **0.9283** | **0.9189** | **0.9253** | **0.9254** |
302
- | **en** | **0.9030** | - | **0.9093** | **0.9078** | **0.9164** | **0.9091** |
303
- | **es** | **0.9183** | 0.9263 | - | **0.9214** | **0.9248** | **0.9227** |
304
- | **gl** | **0.8994** | 0.9037 | **0.9029** | - | **0.9009** | **0.9017** |
305
- | **pt** | **0.8645** | 0.8851 | **0.8789** | **0.8794** | - | **0.8770** |
306
- | **Avg (src)** | **0.8963** | 0.9111 | **0.9049** | **0.9069** | **0.9169** | **0.9072** |
307
-
308
- | SeamlessM4T | ca | en | es | gl | pt | Avg (tgt) |
309
- |:---------------|--------:|------------:|--------:|--------:|--------:|-------:|
310
- | **ca** | - | **0.9340** | 0.9042 | 0.9046 | 0.8875 | 0.9076 |
311
- | **en** | 0.8837 | - | 0.8941 | 0.8878 | 0.8982 | 0.8910 |
312
- | **es** | 0.8523 | **0.9132** | - | 0.8754 | 0.8431 | 0.8710 |
313
- | **gl** | 0.8617 | **0.9238** | 0.8775 | - | 0.8743 | 0.8843 |
314
- | **pt** | 0.7687 | **0.9068** | 0.7983 | 0.8207 | - | 0.8236 |
315
- | **Avg (src)** | 0.8416 | **0.9195** | 0.8685 | 0.8721 | 0.8758 | 0.8755 |
316
 
317
  XCOMET-XL results for Asturian are not reported because it is not supported by this metric.
318
 
@@ -327,37 +314,29 @@ XCOMET-XL results for Asturian are not reported because it is not supported by t
327
 
328
  | SalamandraTAV | ca | en |
329
  |:--------------|----------:|----------:|
330
- | **ca** | - | **37.1** |
331
  | **en** | 41.1 | - |
332
  | **es** | - | **43.9** |
333
- | **pt** | - | 47.9 |
334
- | **Avg (src)** | - | 43.0 |
335
 
336
- | SeamlessM4T | ca | en |
337
- |:--------------|----------:|----------:|
338
- | **ca** | - | 36.8 |
339
- | **en** | **44.0** | - |
340
- | **es** | - | 42.2 |
341
- | **pt** | - | **54.1** |
342
- | **Avg (src)** | - | **44.4** |
343
 
344
  **XCOMET-XL**
345
 
346
  | SalamandraTAV | ca | en |
347
  |:--------------|----------:|----------:|
348
- | **ca** | - | 0.8834 |
349
- | **en** | 0.8450 | - |
350
  | **es** | - | 0.9241 |
351
- | **pt** | - | 0.8990 |
352
- | **Avg (src)** | - | 0.9022 |
 
 
353
 
354
- | SeamlessM4T | ca | en |
355
- |:--------------|----------:|-----------:|
356
- | **ca** | - | **0.8901** |
357
- | **en** | **0.9086**| - |
358
- | **es** | - | **0.9383** |
359
- | **pt** | - | **0.9530** |
360
- | **Avg (src)** | - | **0.9271** |
361
 
362
  </details>
363
 
@@ -372,25 +351,23 @@ Mintzai-ST has overlap with basque_parliament_1, with which we have trained our
372
 
373
  | SalamandraTAV | es | eu |
374
  |:--------------|----------:|----------:|
375
- | **es** | - | 1.2 |
376
- | **eu** | **23.7** | - |
377
 
378
- | SeamlessM4T | es | eu |
379
- |:--------------|----------:|----------:|
380
- | **es** | - | **12.5** |
381
- | **eu** | 21.0 | - |
382
 
383
  **XCOMET-XL**
384
 
385
  | SalamandraTAV | es | eu |
386
  |:--------------|----------:|----------:|
387
- | **es** | - | **0.7893**|
388
- | **eu** | **0.8300**| - |
389
 
390
- | SeamlessM4T | es | eu |
391
- |:--------------|----------:|----------:|
392
- | **es** | - | 0.6682 |
393
- | **eu** | 0.7185 | - |
394
 
395
  </details>
396
 
@@ -403,22 +380,14 @@ Mintzai-ST has overlap with basque_parliament_1, with which we have trained our
403
 
404
  **WER**
405
 
406
- | SalamandraTAV | |
407
- |:--------------|----------:|
408
- | ast | **28.82** |
409
- | ca | **7.37** |
410
- | en | 16.63 |
411
- | es | 7.59 |
412
- | gl | 7.42 |
413
- | pt | 22.38 |
414
-
415
- | SeamlessM4T | |
416
- |:--------------|----------:|
417
- | ca | 7.89 |
418
- | en | **9.15** |
419
- | es | **5.64** |
420
- | gl | **7.38** |
421
- | pt | **19.62** |
422
 
423
  </details>
424
 
@@ -429,22 +398,15 @@ Mintzai-ST has overlap with basque_parliament_1, with which we have trained our
429
 
430
  **WER**
431
 
432
- | SalamandraTAV | |
433
- |:--------------|----------:|
434
- | ast | **25.20** |
435
- | ca | 7.61 |
436
- | en | 7.79 |
437
- | es | 5.87 |
438
- | gl | 10.43 |
439
- | pt | 10.21 |
440
-
441
- | SeamlessM4T | |
442
- |:--------------|----------:|
443
- | ca | **5.74** |
444
- | en | **7.66** |
445
- | es | **5.30** |
446
- | gl | **8.00** |
447
- | pt | **7.94** |
448
 
449
  </details>
450
 
@@ -455,15 +417,11 @@ Mintzai-ST has overlap with basque_parliament_1, with which we have trained our
455
 
456
  **WER**
457
 
458
- | SalamandraTAV | |
459
- |:--------------|----------:|
460
- | es | **8.24** |
461
- | eu | **19.08** |
462
 
463
- | SeamlessM4T | |
464
- |:--------------|----------:|
465
- | es | 9.24 |
466
- | eu | 25.69 |
467
 
468
  </details>
469
 
@@ -478,18 +436,12 @@ For further information, please send an email to <[email protected]>.
478
  ### Copyright
479
  Copyright(c) 2025 by Language Technologies Lab, Barcelona Supercomputing Center.
480
 
481
- [ ] Fede
482
-
483
  ### Funding
484
- This work has been promoted and financed by the Government of Catalonia through ???.
485
-
486
- [ ] Fede
487
 
488
  ### Acknowledgements
489
 
490
- To do...
491
-
492
- [ ] Fede
493
 
494
  ### Disclaimer
495
  Be aware that the model may contain biases or other unintended distortions.
@@ -499,8 +451,6 @@ including those governing the use of Artificial Intelligence.
499
 
500
  The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
501
 
502
- [ ] Fede
503
-
504
  ### License
505
  [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
506
 
@@ -510,11 +460,11 @@ The Barcelona Supercomputing Center, as the owner and creator of the model, shal
510
  If you find our model useful, we would appreciate if you could cite our work as follows:
511
 
512
  ```
513
- @misc{bsclt2025,
514
- author = {Authors},
515
- title = {SalamandraTAV-7b},
516
- year = {2025},
517
- url = {url},
518
- publisher = {Hugging Face}
519
  }
520
  ```
 
15
  - BSC-LT/salamandraTA-7b-instruct
16
  ---
17
 
 
18
  <!-- ![](./images/salamandra_header.png) -->
19
 
20
  # SalamandraTAV-7b Model Card
 
58
 
59
  ### Training Framework
60
 
61
+ The code used to train SalamandraTAV-7b is based on the [Transformers](https://huggingface.co/docs/transformers/) library, and will be publicly available soon.
62
 
63
  ### Compute Infrastructure
64
 
 
83
  The easiest way to use the model is using the custom pipeline `multimodal_mt`:
84
  ```python
85
  from transformers import pipeline
 
86
  pipe = pipeline(
87
  task="multimodal_mt",
88
  model="langtech-veu/salamandra-TAV-7b",
 
120
  Run the S2TT pipeline, specifying the target language:
121
  ```python
122
  transcription = pipe(audio_path, mode="asr", **generation_kwargs)
 
123
  translation = pipe(transcription, mode="t2tt", tgt_lang="Spanish", **generation_kwargs)
124
  ```
125
 
126
  Optionally, you can also specify the source language:
127
  ```python
128
  transcription = pipe(audio_path, mode="asr", src_lang="English", **generation_kwargs)
 
129
  translation = pipe(transcription, mode="t2tt", src_lang="English", tgt_lang="Spanish", **generation_kwargs)
130
  ```
131
 
 
136
  Run the S2TT pipeline, specifying the target language:
137
  ```python
138
  history = pipe(audio_path, return_chat_history=True, mode="asr", **generation_kwargs)
 
139
  translation = pipe(history, mode="t2tt", tgt_lang="Spanish", **generation_kwargs)
140
  ```
141
 
142
  Optionally, you can also specify the source language:
143
  ```python
144
  history = pipe(audio_path, return_chat_history=True, mode="asr", src_lang="English", **generation_kwargs)
 
145
  translation = pipe(history, mode="t2tt", src_lang="English", tgt_lang="Spanish", **generation_kwargs)
146
  ```
147
 
 
150
  If you are interested in getting the intermediate results, you can do it as follows:
151
  ```python
152
  history = pipe(audio_path, return_chat_history=True, mode="asr", **generation_kwargs)
153
+ transcription = history.get_assistant_messages()[-1]
154
  history = pipe(history, return_chat_history=True, mode="t2tt", tgt_lang="Spanish", **generation_kwargs)
155
+ translation = history.get_assistant_messages()[-1]
156
+ history = pipe(history, return_chat_history=True, mode="lid", **generation_kwargs)
157
+ src_language = history.get_assistant_messages()[-1]
158
  ```
159
 
160
 
161
  ## Training Data
162
+
163
+ ### Global Summary
164
+
165
+ | Data Type | Hours | Samples | Tokens (target) | Tokens (total) |
166
+ |:--------------|:----------|:-----------|:------------------|:---------------|
167
+ | **ASR** | 12,147.5h | 5,207,686 | 582,567,674 | 4,180,709,878 |
168
+ | **S2TT** | 896h | 556,664 | 28,297,402 | 153,376,912 |
169
+ | **T2TT** | - | 2,242,354 | 112,837,123 | 220,328,525 |
170
 
171
  ### Automatic Speech Recognition Data
172
  | Dataset | ast | ca | en | es | eu | gl | oc | pt | Total |
 
185
  | Total (hours) | 0.5h |2010.5h|6729.5h|2497.5h| 544h | 181h | 0.5h | 184h |12147.5h|
186
 
187
 
 
 
188
  ### Speech-To-Text Translation
189
 
190
+ | Dataset | ca-en | en-ca | en-es | en-pt | es-en | es-pt | pt-en | pt-es | Total |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
191
  |:----------------------------------------------------------------------------------------------------|:------|:------|:------|:------|:------|:------|:------|:------|:------|
192
  | [CoVoST 2](https://github.com/facebookresearch/covost/) (train) | 135.5h| 430h | | | 113h | | 10.5h | | 689h |
193
  | [Europarl-ST v1.1](https://www.mllp.upv.es/europarl-st/) (train) | | | 75.5h | 74h | 20.5h | 12.5h | 14.5h | 9.5h | 207h |
194
+ | Total (hours) | 135.5h| 430h | 75.5h | 74h | 133.5h| 12.5h | 25h | 9.5h | 896h |
 
195
 
 
196
 
197
  ### Text-To-Text Translation
198
 
199
+ For T2TT data, we filtered [Wikimedia](https://dumps.wikimedia.org/other/contenttranslation/20250801/) and [Tatoeba](https://downloads.tatoeba.org/exports/per_language/) datasets using the following criteria:
200
+ 1. Sample obtained a [GlotLID v3](https://github.com/cisnlp/GlotLID) target language probability of at least 50%
201
+ 2. Sample contains between 5 and 100 words
202
+ 3. Sample obtained a [BLASER 2.0](https://huggingface.co/facebook/blaser-2.0-qe) score higher than 3.75
203
+
204
+ Obtaining 102,845,818 target tokens for Wikimedia and 9,991,305 target tokens for Tatoeba, with the following language distribution:
205
+
206
+ | Language | As Source (samples) | (%) | As Target (samples) | (%) |
207
+ |:---------|--------------------:|:--------------|--------------------:|:--------------|
208
+ | **ast** | 848 | 0.0% | 11,800 | 0.5% |
209
+ | **ca** | 44,278 | 2.0% | 272,250 | 12.1% |
210
+ | **en** | 1,353,784 | 60.4% | 412,166 | 18.4% |
211
+ | **es** | 533,330 | 23.8% | 862,276 | 38.5% |
212
+ | **eu** | 5,266 | 0.2% | 79,534 | 3.5% |
213
+ | **gl** | 14,052 | 0.6% | 85,510 | 3.8% |
214
+ | **pt** | 290,754 | 13.0% | 518,776 | 23.1% |
215
+ | **Total**| 2,242,354 | 100.0% | 2,242,354 | 100.0% |
216
 
 
217
 
218
 
219
 
 
270
 
271
  | SalamandraTAV | ast | ca | en | es | gl | pt | Avg (tgt) |
272
  |:--------------|-----------:|----------:|----------:|----------:|-----------:|-------------:|----------:|
273
+ | **ast** | - | **26.4** | **27.7** | **17.1** | **19.7** | **21.6** | **22.5** |
274
+ | **ca** | **22.9** | - | 39.7 | **22.2** | **29.8** | **31.0** | **29.1** |
275
+ | **en** | **24.3** | **39.1** | - | **25.5** | **32.3** | **43.0** | **32.8** |
276
+ | **es** | **13.7** | **20.8** | 24.8 | - | **18.8** | **19.1** | **19.4** |
277
+ | **gl** | **18.1** | **29.4** | 32.4 | **20.6** | - | **26.0** | **25.3** |
278
+ | **pt** | **19.9** | **30.6** | 37.0 | **20.6** | **25.6** | - | **26.7** |
279
+ | **Avg (src)** | **19.8** | **29.3** | 32.3 | **21.2** | **25.2** | **28.1** | **26.0** |
280
+
281
+ **SalamandraTAV vs SeamlessM4T BLEU Difference**
282
+
283
+ ![](./images/s2tt-fleurs-bleu.png)
 
 
 
 
 
 
284
 
285
  The evaluation results reported here were obtained by manually running inference with the model on the test set. We verified that the results from SeamlessM4T are consistent with the [official results](https://huggingface.co/facebook/seamless-m4t-v2-large) reported by the authors.
286
 
 
288
  **XCOMET-XL**
289
 
290
 
291
+ | SalamandraTAV | ca | en | es | gl | pt | Avg (tgt) |
292
+ |:---------------|-----------:|-----------:|------------:|------------:|------------:|-----------:|
293
+ | **ca** | - | 0.9289 | **0.9114** | **0.9202** | **0.9136** | **0.9185** |
294
+ | **en** | **0.8987** | - | **0.9045** | **0.8981** | **0.9077** | **0.9023** |
295
+ | **es** | **0.9013** | **0.9257** | - | **0.9162** | **0.9110** | **0.9136** |
296
+ | **gl** | **0.8930** | 0.9024 | **0.8890** | - | **0.9005** | **0.8962** |
297
+ | **pt** | **0.8697** | 0.8912 | **0.8775** | **0.8862** | - | **0.8812** |
298
+ | **Avg (src)** | **0.8907** | 0.9121 | **0.8956** | **0.9052** | **0.9082** | **0.9023** |
299
+
300
+ **SalamandraTAV vs SeamlessM4T XCOMET-XL Difference**
301
+
302
+ ![](./images/s2tt-fleurs-xcomet.png)
 
 
 
 
 
303
 
304
  XCOMET-XL results for Asturian are not reported because it is not supported by this metric.
305
 
 
314
 
315
  | SalamandraTAV | ca | en |
316
  |:--------------|----------:|----------:|
317
+ | **ca** | - | **37.3** |
318
  | **en** | 41.1 | - |
319
  | **es** | - | **43.9** |
320
+ | **pt** | - | 49.0 |
321
+ | **Avg (src)** | 41.1 | 43.4 |
322
 
323
+ **SalamandraTAV vs SeamlessM4T BLEU Difference**
324
+
325
+ ![](./images/s2tt-covost-bleu.png)
 
 
 
 
326
 
327
  **XCOMET-XL**
328
 
329
  | SalamandraTAV | ca | en |
330
  |:--------------|----------:|----------:|
331
+ | **ca** | - | 0.8835 |
332
+ | **en** | 0.8454 | - |
333
  | **es** | - | 0.9241 |
334
+ | **pt** | - | 0.8967 |
335
+ | **Avg (src)** | 0.8454 | 0.9014 |
336
+
337
+ **SalamandraTAV vs SeamlessM4T XCOMET-XL Difference**
338
 
339
+ ![](./images/s2tt-covost-xcomet.png)
 
 
 
 
 
 
340
 
341
  </details>
342
 
 
351
 
352
  | SalamandraTAV | es | eu |
353
  |:--------------|----------:|----------:|
354
+ | **es** | - | **21.7** |
355
+ | **eu** | **26.8** | - |
356
 
357
+ **SalamandraTAV vs SeamlessM4T BLEU Difference**
358
+
359
+ ![](./images/s2tt-mintzai-bleu.png)
 
360
 
361
  **XCOMET-XL**
362
 
363
  | SalamandraTAV | es | eu |
364
  |:--------------|----------:|----------:|
365
+ | **es** | - | **0.8000**|
366
+ | **eu** | **0.8329**| - |
367
 
368
+ **SalamandraTAV vs SeamlessM4T XCOMET-XL Difference**
369
+
370
+ ![](./images/s2tt-mintzai-xcomet.png)
 
371
 
372
  </details>
373
 
 
380
 
381
  **WER**
382
 
383
+ | | SalamandraTAV | SeamlessM4T | Whisper v3 | Spire |
384
+ |:--------------|--------------:|------------:|------------:|------:|
385
+ | **ast** | **29.35** | - | - | - |
386
+ | **ca** | **7.34** | 7.89 | 14.11 | - |
387
+ | **en** | 16.65 | **9.15** | 11.13 | 22.56 |
388
+ | **es** | 7.72 | 5.64 | **5.21** | - |
389
+ | **gl** | 7.83 | **7.38** | 14.50 | - |
390
+ | **pt** | 21.80 | 19.62 | **6.85** | - |
 
 
 
 
 
 
 
 
391
 
392
  </details>
393
 
 
398
 
399
  **WER**
400
 
401
+ | | SalamandraTAV | SeamlessM4T | Whisper v3 | Spire |
402
+ |:--------------|--------------:|------------:|------------:|------:|
403
+ | **ast** | **25.68** | - | - | - |
404
+ | **ca** | 7.34 | 5.74 | **4.88** | - |
405
+ | **en** | 8.35 | 7.66 | **4.81** | 9.07 |
406
+ | **es** | 6.04 | 5.30 | **2.95** | - |
407
+ | **gl** | 11.83 | **8.00** | 13.61 | - |
408
+ | **pt** | 10.55 | 7.94 | **3.97** | - |
409
+
 
 
 
 
 
 
 
410
 
411
  </details>
412
 
 
417
 
418
  **WER**
419
 
420
+ | | SalamandraTAV | SeamlessM4T | Whisper v3 |
421
+ |:--------------|--------------:|------------:|------------:|
422
+ | **es** | 8.21 | 9.24 | **7.37** |
423
+ | **eu** | **19.34** | 25.69 | 48.64 |
424
 
 
 
 
 
425
 
426
  </details>
427
 
 
436
  ### Copyright
437
  Copyright(c) 2025 by Language Technologies Lab, Barcelona Supercomputing Center.
438
 
 
 
439
  ### Funding
440
+ This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Modelos del Lenguaje.
 
 
441
 
442
  ### Acknowledgements
443
 
444
+ The author thankfully acknowledges the computer resources at MareNostrum and the technical support provided by Barcelona Supercomputing Center (RES-IM-2025-2-0027).
 
 
445
 
446
  ### Disclaimer
447
  Be aware that the model may contain biases or other unintended distortions.
 
451
 
452
  The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
453
 
 
 
454
  ### License
455
  [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
456
 
 
460
  If you find our model useful, we would appreciate if you could cite our work as follows:
461
 
462
  ```
463
+ @misc{bsclt2025salamandraTAV7b ,
464
+ title={salamandra-TAV-7b: a Speech-To-Text Translation model based on an end-to-end Speech LLM for Iberian Languages.},
465
+ author={…; España-Bonet, Cristina},
466
+ organization={Barcelona Supercomputing Center},
467
+ url={https://huggingface.co/langtech-veu/salamandra-TAV-7b},
468
+ year={2025}
469
  }
470
  ```
images/s2tt-covost-bleu.png ADDED
images/s2tt-covost-xcomet.png ADDED
images/s2tt-fleurs-bleu.png ADDED
images/s2tt-fleurs-xcomet.png ADDED
images/s2tt-mintzai-bleu.png ADDED
images/s2tt-mintzai-xcomet.png ADDED