Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

README.md +107 -157
images/s2tt-covost-bleu.png +0 -0
images/s2tt-covost-xcomet.png +0 -0
images/s2tt-fleurs-bleu.png +0 -0
images/s2tt-fleurs-xcomet.png +0 -0
images/s2tt-mintzai-bleu.png +0 -0
images/s2tt-mintzai-xcomet.png +0 -0

README.md CHANGED Viewed

@@ -15,7 +15,6 @@ base_model:
 - BSC-LT/salamandraTA-7b-instruct
 ---
 <!-- ![](./images/salamandra_header.png) -->
 # SalamandraTAV-7b Model Card
@@ -59,7 +58,7 @@ The model is intended for both research and commercial use for the speech to tex
 ### Training Framework
-The code used to train SalamandraTAV-7b is based on the [Transformers](https://huggingface.co/docs/transformers/) library, and can be found in [github/speech_salamandra]().
 ### Compute Infrastructure
@@ -84,7 +83,6 @@ Training was conducted on 4 nodes, each with the following specifications:
 The easiest way to use the model is using the custom pipeline `multimodal_mt`:
 ```python
 from transformers import pipeline
 pipe = pipeline(
   task="multimodal_mt",
   model="langtech-veu/salamandra-TAV-7b",
@@ -122,14 +120,12 @@ translation = pipe(audio_path, mode="s2tt", src_lang="English", tgt_lang="Spanis
 Run the S2TT pipeline, specifying the target language:
 ```python
 transcription = pipe(audio_path, mode="asr", **generation_kwargs)
 translation = pipe(transcription, mode="t2tt", tgt_lang="Spanish", **generation_kwargs)
 ```
 Optionally, you can also specify the source language:
 ```python
 transcription = pipe(audio_path, mode="asr", src_lang="English", **generation_kwargs)
 translation = pipe(transcription, mode="t2tt", src_lang="English", tgt_lang="Spanish", **generation_kwargs)
 ```
@@ -140,14 +136,12 @@ This is a variant which uses a CoT mechanism to generate the translation by tran
 Run the S2TT pipeline, specifying the target language:
 ```python
 history = pipe(audio_path, return_chat_history=True, mode="asr", **generation_kwargs)
 translation = pipe(history, mode="t2tt", tgt_lang="Spanish", **generation_kwargs)
 ```
 Optionally, you can also specify the source language:
 ```python
 history = pipe(audio_path, return_chat_history=True, mode="asr", src_lang="English", **generation_kwargs)
 translation = pipe(history, mode="t2tt", src_lang="English", tgt_lang="Spanish", **generation_kwargs)
 ```
@@ -156,15 +150,23 @@ translation = pipe(history, mode="t2tt", src_lang="English", tgt_lang="Spanish",
 If you are interested in getting the intermediate results, you can do it as follows:
 ```python
 history = pipe(audio_path, return_chat_history=True, mode="asr", **generation_kwargs)
 history = pipe(history, return_chat_history=True, mode="t2tt", tgt_lang="Spanish", **generation_kwargs)
-print(history.get_assistant_messages())
 ```
 ## Training Data
-![IMAGE](./images/data.png)
 ### Automatic Speech Recognition Data
 |  Dataset                                                                                            |  ast   |  ca   |  en   |  es   |  eu   |  gl   |  oc   |  pt   | Total |
@@ -183,39 +185,35 @@ print(history.get_assistant_messages())
 |  Total (hours)                                                                                      |  0.5h  |2010.5h|6729.5h|2497.5h|  544h |  181h |  0.5h |  184h |12147.5h|
-![IMAGE](./images/data.png)
 ### Speech-To-Text Translation
-For the S2TT data, we extended the Common Voice Corpus 21.0 into the translation domain by translating the original transcriptions into all target languages supported by our model, using [salamandraTA-7b-instruct](https://huggingface.co/BSC-LT/salamandraTA-7b-instruct). To ensure quality, we filtered the synthesized samples using a [BLASER 2.0](https://huggingface.co/facebook/blaser-2.0-qe) score threshold of 3.75 and a [GlotLID v3](https://github.com/cisnlp/GlotLID) target language probability of at least 50%. This dataset will be released soon.
-> [!NOTE] Note that source audios are shared over all the target languages.
-| Common Voice Corpus 21.0 - SynthS2TT |  ast  |   ca  |   en  |   es  |   eu  |   gl  |   oc  |   pt  | Total (tgt) |
-|:-------------------------------------|:------|:------|:------|:------|:------|:------|:------|:------|:------|
-|  **ast**                             |   -   | 13min | 19min | 16min |  7min | 12min |  9min | 17min |       |
-|  **ca**                              |       |   -   |       |       |       |       |       |       |       |
-|  **en**                              |       |       |   -   |       |       |       |       |       |       |
-|  **es**                              |       |       |       |   -   |       |       |       |       |       |
-|  **eu**                              |       |       |       |       |   -   |       |       |       |       |
-|  **gl**                              |       |       |       |       |       |   -   |       |       |       |
-|  **oc**                              |       |       |       |       |       |       |   -   |       |       |
-|  **pt**                              |       |       |       |       |       |       |       |   -   |       |
-|  **Total (src)**                           |       |       |       |       |       |       |       |       |   -   |
-|  Other Datasets                                                                                     | ca-en | en-ca | en-es | en-pt | es-en | es-pt | pt-en | pt-es | Total |
 |:----------------------------------------------------------------------------------------------------|:------|:------|:------|:------|:------|:------|:------|:------|:------|
 | [CoVoST 2](https://github.com/facebookresearch/covost/) (train)                                     | 135.5h|  430h |       |       |  113h |       | 10.5h |       |  689h |
 | [Europarl-ST v1.1](https://www.mllp.upv.es/europarl-st/) (train)                                    |       |       | 75.5h |  74h  | 20.5h | 12.5h | 14.5h |  9.5h |  207h |
-|  Total (hours)                                                                                      |       |       |       |       |       |       |       |       |       |
-![IMAGE](./images/data.png)
 ### Text-To-Text Translation
-- None yet
-![IMAGE](./images/data.png)
@@ -272,23 +270,17 @@ transformation = jiwer.Compose([
 | SalamandraTAV |        ast |   ca      |   en      |   es      |   gl       |   pt         |   Avg (tgt)    |
 |:--------------|-----------:|----------:|----------:|----------:|-----------:|-------------:|----------:|
-| **ast**       |        -   |  **26.5** |  **27.2** |  **17.3** |       13.5 |     **22.2** |  **21.3** |
-| **ca**        |  **21.2**  |        -  |      39.8 |  **23.3** |   **28.4** |     **30.5** |  **28.6** |
-| **en**        |  **22.3**  |  **38.6** |        -  |  **25.0** |   **32.0** |         38.8 |    31.3   |
-| **es**        |  **13.3**  |  **23.1** |      25.1 |        -  |   **18.2** |     **21.1** |  **20.2** |
-| **gl**        |  **17.4**  |  **28.3** |      32.2 |  **20.9** |          - |     **24.1** |    24.6   |
-| **pt**        |  **18.8**  |  **32.3** |      36.1 |  **21.3** |   **22.0** |           -  |  **26.1** |
-| **Avg (src)** |  **18.6**  |  **29.8** |      32.1 |  **21.6** |   **22.8** |     **27.3** |  **25.4** |
-| SeamlessM4T   |   ast      |   ca      |   en      |   es      |   gl       |   pt         |   Avg (tgt) |
-|:--------------|-----------:|----------:|----------:|----------:|-----------:|-------------:|------:|
-| **ast**       |        -   |    17.6   |    26.4   |    12.5   |  **15.8**  |     14.8     |  17.4 |
-| **ca**        |        -   |      -    |  **40.0** |    20.9   |    26.1    |     27.0     |  28.5 |
-| **en**        |        -   |    36.7   |     -     |    23.7   |    29.7    |   **43.0**   |  **33.3** |
-| **es**        |        -   |    16.8   |  **25.4** |     -     |    15.3    |     14.4     |  18.0 |
-| **gl**        |        -   |    24.0   |  **34.6** |    19.4   |      -     |     23.0     |  **25.3** |
-| **pt**        |        -   |    19.3   |  **38.4** |    13.3   |    15.8    |       -      |  21.7 |
-| **Avg (src)** |        -   |    22.9   |  **33.0** |    18.0   |    20.5    |     24.4     |  23.8 |
 The evaluation results reported here were obtained by manually running inference with the model on the test set. We verified that the results from SeamlessM4T are consistent with the [official results](https://huggingface.co/facebook/seamless-m4t-v2-large) reported by the authors.
@@ -296,23 +288,18 @@ The evaluation results reported here were obtained by manually running inference
 **XCOMET-XL**
-| SalamandraTAV  |     ca     |    en   |       es    |    gl       |  pt         |     Avg (tgt)   |
-|:---------------|-----------:|--------:|------------:|------------:|------------:|-----------:|
-| **ca**         |      -     |  0.9291 |  **0.9283** |  **0.9189** |  **0.9253** | **0.9254** |
-| **en**         | **0.9030** |   -     |  **0.9093** |  **0.9078** |  **0.9164** | **0.9091** |
-| **es**         | **0.9183** |  0.9263 |      -      |  **0.9214** |  **0.9248** | **0.9227** |
-| **gl**         | **0.8994** |  0.9037 |  **0.9029** |      -      |  **0.9009** | **0.9017** |
-| **pt**         | **0.8645** |  0.8851 |  **0.8789** |  **0.8794** |       -     | **0.8770** |
-| **Avg (src)**  | **0.8963** |  0.9111 |  **0.9049** |  **0.9069** |  **0.9169** | **0.9072** |
-| SeamlessM4T    |    ca   |      en     |    es   |    gl   |    pt   |   Avg (tgt)  |
-|:---------------|--------:|------------:|--------:|--------:|--------:|-------:|
-| **ca**         |    -    |  **0.9340** |  0.9042 |  0.9046 |  0.8875 | 0.9076 |
-| **en**         |  0.8837 |       -     |  0.8941 |  0.8878 |  0.8982 | 0.8910 |
-| **es**         |  0.8523 |  **0.9132** |    -    |  0.8754 |  0.8431 | 0.8710 |
-| **gl**         |  0.8617 |  **0.9238** |  0.8775 |    -    |  0.8743 | 0.8843 |
-| **pt**         |  0.7687 |  **0.9068** |  0.7983 |  0.8207 |    -    | 0.8236 |
-| **Avg (src)**  |  0.8416 |  **0.9195** |  0.8685 |  0.8721 |  0.8758 | 0.8755 |
 XCOMET-XL results for Asturian are not reported because it is not supported by this metric.
@@ -327,37 +314,29 @@ XCOMET-XL results for Asturian are not reported because it is not supported by t
 | SalamandraTAV |   ca      |   en      |
 |:--------------|----------:|----------:|
-| **ca**        |        -  |  **37.1** |
 | **en**        |  41.1     |        -  |
 | **es**        |        -  |  **43.9** |
-| **pt**        |        -  |      47.9 |
-| **Avg (src)**       |        -  |      43.0 |
-| SeamlessM4T   |   ca      |   en      |
-|:--------------|----------:|----------:|
-| **ca**        |        -  |      36.8 |
-| **en**        |  **44.0** |        -  |
-| **es**        |        -  |      42.2 |
-| **pt**        |        -  |  **54.1** |
-| **Avg (src)**       |        -  |  **44.4** |
 **XCOMET-XL**
 | SalamandraTAV |   ca      |   en      |
 |:--------------|----------:|----------:|
-| **ca**        |        -  |    0.8834 |
-| **en**        |    0.8450 |        -  |
 | **es**        |        -  |    0.9241 |
-| **pt**        |        -  |    0.8990 |
-| **Avg (src)** |        -  |    0.9022 |
-| SeamlessM4T   |   ca      |   en       |
-|:--------------|----------:|-----------:|
-| **ca**        |        -  | **0.8901** |
-| **en**        | **0.9086**|        -   |
-| **es**        |        -  | **0.9383** |
-| **pt**        |        -  | **0.9530** |
-| **Avg (src)** |        -  | **0.9271** |
 </details>
@@ -372,25 +351,23 @@ Mintzai-ST has overlap with basque_parliament_1, with which we have trained our
 | SalamandraTAV |   es      |   eu      |
 |:--------------|----------:|----------:|
-| **es**        |        -  |       1.2 |
-| **eu**        | **23.7**  |         - |
-| SeamlessM4T   |   es      |   eu      |
-|:--------------|----------:|----------:|
-| **es**        |        -  |  **12.5** |
-| **eu**        |      21.0 |        -  |
 **XCOMET-XL**
 | SalamandraTAV |   es      |   eu      |
 |:--------------|----------:|----------:|
-| **es**        |        -  | **0.7893**|
-| **eu**        | **0.8300**|         - |
-| SeamlessM4T   |   es      |   eu      |
-|:--------------|----------:|----------:|
-| **es**        |        -  |   0.6682  |
-| **eu**        |    0.7185 |        -  |
 </details>
@@ -403,22 +380,14 @@ Mintzai-ST has overlap with basque_parliament_1, with which we have trained our
 **WER**
-| SalamandraTAV |           |
-|:--------------|----------:|
-| ast           | **28.82** |
-| ca            |  **7.37** |
-| en            |     16.63 |
-| es            |      7.59 |
-| gl            |      7.42 |
-| pt            |     22.38 |
-| SeamlessM4T   |           |
-|:--------------|----------:|
-| ca            |      7.89 |
-| en            |  **9.15** |
-| es            |  **5.64** |
-| gl            |  **7.38** |
-| pt            | **19.62** |
 </details>
@@ -429,22 +398,15 @@ Mintzai-ST has overlap with basque_parliament_1, with which we have trained our
 **WER**
-| SalamandraTAV |           |
-|:--------------|----------:|
-| ast           | **25.20** |
-| ca            |  7.61 |
-| en            |  7.79 |
-| es            |  5.87 |
-| gl            |  10.43 |
-| pt            |  10.21 |
-| SeamlessM4T   |           |
-|:--------------|----------:|
-| ca            |  **5.74** |
-| en            |  **7.66** |
-| es            |  **5.30** |
-| gl            |  **8.00** |
-| pt            |  **7.94** |
 </details>
@@ -455,15 +417,11 @@ Mintzai-ST has overlap with basque_parliament_1, with which we have trained our
 **WER**
-| SalamandraTAV |           |
-|:--------------|----------:|
-| es            |  **8.24** |
-| eu            | **19.08** |
-| SeamlessM4T   |           |
-|:--------------|----------:|
-| es            |      9.24 |
-| eu            |     25.69 |
 </details>
@@ -478,18 +436,12 @@ For further information, please send an email to <[email protected]>.
 ### Copyright
 Copyright(c) 2025 by Language Technologies Lab, Barcelona Supercomputing Center.
-[ ] Fede
 ### Funding
-This work has been promoted and financed by the Government of Catalonia through ???.
-[ ] Fede
 ### Acknowledgements
-To do...
-[ ] Fede
 ### Disclaimer
 Be aware that the model may contain biases or other unintended distortions.
@@ -499,8 +451,6 @@ including those governing the use of Artificial Intelligence.
 The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
-[ ] Fede
 ### License
 [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
@@ -510,11 +460,11 @@ The Barcelona Supercomputing Center, as the owner and creator of the model, shal
 If you find our model useful, we would appreciate if you could cite our work as follows:
 ```
-@misc{bsclt2025,
-  author = {Authors},
-  title = {SalamandraTAV-7b},
-  year = {2025},
-  url = {url},
-  publisher = {Hugging Face}
 }
 ```

 - BSC-LT/salamandraTA-7b-instruct
 ---
 <!-- ![](./images/salamandra_header.png) -->
 # SalamandraTAV-7b Model Card
 ### Training Framework
+The code used to train SalamandraTAV-7b is based on the [Transformers](https://huggingface.co/docs/transformers/) library, and will be publicly available soon.
 ### Compute Infrastructure
 The easiest way to use the model is using the custom pipeline `multimodal_mt`:
 ```python
 from transformers import pipeline
 pipe = pipeline(
   task="multimodal_mt",
   model="langtech-veu/salamandra-TAV-7b",
 Run the S2TT pipeline, specifying the target language:
 ```python
 transcription = pipe(audio_path, mode="asr", **generation_kwargs)
 translation = pipe(transcription, mode="t2tt", tgt_lang="Spanish", **generation_kwargs)
 ```
 Optionally, you can also specify the source language:
 ```python
 transcription = pipe(audio_path, mode="asr", src_lang="English", **generation_kwargs)
 translation = pipe(transcription, mode="t2tt", src_lang="English", tgt_lang="Spanish", **generation_kwargs)
 ```
 Run the S2TT pipeline, specifying the target language:
 ```python
 history = pipe(audio_path, return_chat_history=True, mode="asr", **generation_kwargs)
 translation = pipe(history, mode="t2tt", tgt_lang="Spanish", **generation_kwargs)
 ```
 Optionally, you can also specify the source language:
 ```python
 history = pipe(audio_path, return_chat_history=True, mode="asr", src_lang="English", **generation_kwargs)
 translation = pipe(history, mode="t2tt", src_lang="English", tgt_lang="Spanish", **generation_kwargs)
 ```
 If you are interested in getting the intermediate results, you can do it as follows:
 ```python
 history = pipe(audio_path, return_chat_history=True, mode="asr", **generation_kwargs)
+transcription = history.get_assistant_messages()[-1]
 history = pipe(history, return_chat_history=True, mode="t2tt", tgt_lang="Spanish", **generation_kwargs)
+translation = history.get_assistant_messages()[-1]
+history = pipe(history, return_chat_history=True, mode="lid", **generation_kwargs)
+src_language = history.get_assistant_messages()[-1]
 ```
 ## Training Data
+### Global Summary
+| Data Type     | Hours     | Samples    | Tokens (target)   | Tokens (total) |
+|:--------------|:----------|:-----------|:------------------|:---------------|
+| **ASR**       | 12,147.5h |  5,207,686 |       582,567,674 |  4,180,709,878 |
+| **S2TT**      |      896h |    556,664 |        28,297,402 |    153,376,912 |
+| **T2TT**      |         - |  2,242,354 |       112,837,123 |    220,328,525 |
 ### Automatic Speech Recognition Data
 |  Dataset                                                                                            |  ast   |  ca   |  en   |  es   |  eu   |  gl   |  oc   |  pt   | Total |
 |  Total (hours)                                                                                      |  0.5h  |2010.5h|6729.5h|2497.5h|  544h |  181h |  0.5h |  184h |12147.5h|
 ### Speech-To-Text Translation
+|  Dataset                                                                                            | ca-en | en-ca | en-es | en-pt | es-en | es-pt | pt-en | pt-es | Total |
 |:----------------------------------------------------------------------------------------------------|:------|:------|:------|:------|:------|:------|:------|:------|:------|
 | [CoVoST 2](https://github.com/facebookresearch/covost/) (train)                                     | 135.5h|  430h |       |       |  113h |       | 10.5h |       |  689h |
 | [Europarl-ST v1.1](https://www.mllp.upv.es/europarl-st/) (train)                                    |       |       | 75.5h |  74h  | 20.5h | 12.5h | 14.5h |  9.5h |  207h |
+|  Total (hours)                                                                                      | 135.5h|  430h | 75.5h |  74h  | 133.5h| 12.5h |  25h  |  9.5h |  896h |
 ### Text-To-Text Translation
+For T2TT data, we filtered [Wikimedia](https://dumps.wikimedia.org/other/contenttranslation/20250801/) and [Tatoeba](https://downloads.tatoeba.org/exports/per_language/) datasets using the following criteria:
+1. Sample obtained a [GlotLID v3](https://github.com/cisnlp/GlotLID) target language probability of at least 50%
+2. Sample contains between 5 and 100 words
+3. Sample obtained a [BLASER 2.0](https://huggingface.co/facebook/blaser-2.0-qe) score higher than 3.75
+Obtaining 102,845,818 target tokens for Wikimedia and 9,991,305 target tokens for Tatoeba, with the following language distribution:
+| Language | As Source (samples) |       (%)     | As Target (samples) |      (%)      |
+|:---------|--------------------:|:--------------|--------------------:|:--------------|
+| **ast**  |                848  |          0.0% |            11,800   |          0.5% |
+| **ca**   |             44,278  |          2.0% |           272,250   |         12.1% |
+| **en**   |          1,353,784  |         60.4% |           412,166   |         18.4% |
+| **es**   |            533,330  |         23.8% |           862,276   |         38.5% |
+| **eu**   |              5,266  |          0.2% |            79,534   |          3.5% |
+| **gl**   |             14,052  |          0.6% |            85,510   |          3.8% |
+| **pt**   |            290,754  |         13.0% |           518,776   |         23.1% |
+| **Total**|          2,242,354  |        100.0% |         2,242,354   |        100.0% |
 | SalamandraTAV |        ast |   ca      |   en      |   es      |   gl       |   pt         |   Avg (tgt)    |
 |:--------------|-----------:|----------:|----------:|----------:|-----------:|-------------:|----------:|
+| **ast**       |        -   |  **26.4** | **27.7**  |  **17.1** |   **19.7** |     **21.6** |  **22.5** |
+| **ca**        |  **22.9**  |        -  |   39.7    |  **22.2** |   **29.8** |     **31.0** |  **29.1** |
+| **en**        |  **24.3**  |  **39.1** |     -     |  **25.5** |   **32.3** |     **43.0** |  **32.8** |
+| **es**        |  **13.7**  |  **20.8** |   24.8    |      -    |   **18.8** |     **19.1** |  **19.4** |
+| **gl**        |  **18.1**  |  **29.4** |   32.4    |  **20.6** |      -     |     **26.0** |  **25.3** |
+| **pt**        |  **19.9**  |  **30.6** |   37.0    |  **20.6** |   **25.6** |        -     |  **26.7** |
+| **Avg (src)** |  **19.8**  |  **29.3** |   32.3    |  **21.2** |   **25.2** |     **28.1** |  **26.0** |
+**SalamandraTAV vs SeamlessM4T BLEU Difference**
+![](./images/s2tt-fleurs-bleu.png)
 The evaluation results reported here were obtained by manually running inference with the model on the test set. We verified that the results from SeamlessM4T are consistent with the [official results](https://huggingface.co/facebook/seamless-m4t-v2-large) reported by the authors.
 **XCOMET-XL**
+| SalamandraTAV  |     ca     |    en      |       es    |    gl       |  pt         |     Avg (tgt)   |
+|:---------------|-----------:|-----------:|------------:|------------:|------------:|-----------:|
+| **ca**         |      -     |  0.9289    |  **0.9114** |  **0.9202** |  **0.9136** | **0.9185** |
+| **en**         | **0.8987** |   -        |  **0.9045** |  **0.8981** |  **0.9077** | **0.9023** |
+| **es**         | **0.9013** | **0.9257** |      -      |  **0.9162** |  **0.9110** | **0.9136** |
+| **gl**         | **0.8930** |  0.9024    |  **0.8890** |      -      |  **0.9005** | **0.8962** |
+| **pt**         | **0.8697** |  0.8912    |  **0.8775** |  **0.8862** |       -     | **0.8812** |
+| **Avg (src)**  | **0.8907** |  0.9121    |  **0.8956** |  **0.9052** |  **0.9082** | **0.9023** |
+**SalamandraTAV vs SeamlessM4T XCOMET-XL Difference**
+![](./images/s2tt-fleurs-xcomet.png)
 XCOMET-XL results for Asturian are not reported because it is not supported by this metric.
 | SalamandraTAV |   ca      |   en      |
 |:--------------|----------:|----------:|
+| **ca**        |        -  |  **37.3** |
 | **en**        |  41.1     |        -  |
 | **es**        |        -  |  **43.9** |
+| **pt**        |        -  |      49.0 |
+| **Avg (src)** |     41.1  |      43.4 |
+**SalamandraTAV vs SeamlessM4T BLEU Difference**
+![](./images/s2tt-covost-bleu.png)
 **XCOMET-XL**
 | SalamandraTAV |   ca      |   en      |
 |:--------------|----------:|----------:|
+| **ca**        |        -  |    0.8835 |
+| **en**        |    0.8454 |        -  |
 | **es**        |        -  |    0.9241 |
+| **pt**        |        -  |    0.8967 |
+| **Avg (src)** |    0.8454 |    0.9014 |
+**SalamandraTAV vs SeamlessM4T XCOMET-XL Difference**
+![](./images/s2tt-covost-xcomet.png)
 </details>
 | SalamandraTAV |   es      |   eu      |
 |:--------------|----------:|----------:|
+| **es**        |        -  |  **21.7** |
+| **eu**        | **26.8**  |         - |
+**SalamandraTAV vs SeamlessM4T BLEU Difference**
+![](./images/s2tt-mintzai-bleu.png)
 **XCOMET-XL**
 | SalamandraTAV |   es      |   eu      |
 |:--------------|----------:|----------:|
+| **es**        |        -  | **0.8000**|
+| **eu**        | **0.8329**|         - |
+**SalamandraTAV vs SeamlessM4T XCOMET-XL Difference**
+![](./images/s2tt-mintzai-xcomet.png)
 </details>
 **WER**
+|               | SalamandraTAV | SeamlessM4T | Whisper v3  | Spire |
+|:--------------|--------------:|------------:|------------:|------:|
+| **ast**       |     **29.35** |           - |           - |   -   |
+| **ca**        |      **7.34** |        7.89 |       14.11 |   -   |
+| **en**        |         16.65 |    **9.15** |       11.13 | 22.56 |
+| **es**        |          7.72 |        5.64 |    **5.21** |   -   |
+| **gl**        |          7.83 |    **7.38** |       14.50 |   -   |
+| **pt**        |         21.80 |       19.62 |    **6.85** |   -   |
 </details>
 **WER**
+|               | SalamandraTAV | SeamlessM4T | Whisper v3  | Spire |
+|:--------------|--------------:|------------:|------------:|------:|
+| **ast**       |     **25.68** |           - |           - |     - |
+| **ca**        |          7.34 |        5.74 |    **4.88** |     - |
+| **en**        |          8.35 |        7.66 |    **4.81** |  9.07 |
+| **es**        |          6.04 |        5.30 |    **2.95** |     - |
+| **gl**        |         11.83 |    **8.00** |       13.61 |     - |
+| **pt**        |         10.55 |        7.94 |    **3.97** |     - |
 </details>
 **WER**
+|               | SalamandraTAV | SeamlessM4T | Whisper v3  |
+|:--------------|--------------:|------------:|------------:|
+| **es**        |          8.21 |        9.24 |    **7.37** |
+| **eu**        |     **19.34** |       25.69 |       48.64 |
 </details>
 ### Copyright
 Copyright(c) 2025 by Language Technologies Lab, Barcelona Supercomputing Center.
 ### Funding
+This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Modelos del Lenguaje.
 ### Acknowledgements
+The author thankfully acknowledges the computer resources at MareNostrum and the technical support provided by Barcelona Supercomputing Center (RES-IM-2025-2-0027).
 ### Disclaimer
 Be aware that the model may contain biases or other unintended distortions.
 The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
 ### License
 [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 If you find our model useful, we would appreciate if you could cite our work as follows:
 ```
+@misc{bsclt2025salamandraTAV7b ,
+title={salamandra-TAV-7b: a Speech-To-Text Translation model based on an end-to-end Speech LLM for Iberian Languages.},
+author={…; España-Bonet, Cristina},
+organization={Barcelona Supercomputing Center},
+url={https://huggingface.co/langtech-veu/salamandra-TAV-7b},
+year={2025}
 }
 ```

images/s2tt-covost-bleu.png ADDED Viewed

images/s2tt-covost-xcomet.png ADDED Viewed

images/s2tt-fleurs-bleu.png ADDED Viewed

images/s2tt-fleurs-xcomet.png ADDED Viewed

images/s2tt-mintzai-bleu.png ADDED Viewed

images/s2tt-mintzai-xcomet.png ADDED Viewed