Instructions to use BSC-LT/salamandraTA-7B-instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BSC-LT/salamandraTA-7B-instruct-GGUF with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="BSC-LT/salamandraTA-7B-instruct-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("BSC-LT/salamandraTA-7B-instruct-GGUF") model = AutoModelForCausalLM.from_pretrained("BSC-LT/salamandraTA-7B-instruct-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use BSC-LT/salamandraTA-7B-instruct-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="BSC-LT/salamandraTA-7B-instruct-GGUF", filename="salamandraTA_7B_inst_q4.gguf", )
llm.create_chat_completion( messages = "\"Меня зовут Вольфганг и я живу в Берлине\"" )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use BSC-LT/salamandraTA-7B-instruct-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BSC-LT/salamandraTA-7B-instruct-GGUF # Run inference directly in the terminal: llama-cli -hf BSC-LT/salamandraTA-7B-instruct-GGUF
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BSC-LT/salamandraTA-7B-instruct-GGUF # Run inference directly in the terminal: llama-cli -hf BSC-LT/salamandraTA-7B-instruct-GGUF
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf BSC-LT/salamandraTA-7B-instruct-GGUF # Run inference directly in the terminal: ./llama-cli -hf BSC-LT/salamandraTA-7B-instruct-GGUF
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf BSC-LT/salamandraTA-7B-instruct-GGUF # Run inference directly in the terminal: ./build/bin/llama-cli -hf BSC-LT/salamandraTA-7B-instruct-GGUF
Use Docker
docker model run hf.co/BSC-LT/salamandraTA-7B-instruct-GGUF
- LM Studio
- Jan
- Ollama
How to use BSC-LT/salamandraTA-7B-instruct-GGUF with Ollama:
ollama run hf.co/BSC-LT/salamandraTA-7B-instruct-GGUF
- Unsloth Studio
How to use BSC-LT/salamandraTA-7B-instruct-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BSC-LT/salamandraTA-7B-instruct-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BSC-LT/salamandraTA-7B-instruct-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for BSC-LT/salamandraTA-7B-instruct-GGUF to start chatting
- Docker Model Runner
How to use BSC-LT/salamandraTA-7B-instruct-GGUF with Docker Model Runner:
docker model run hf.co/BSC-LT/salamandraTA-7B-instruct-GGUF
- Lemonade
How to use BSC-LT/salamandraTA-7B-instruct-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull BSC-LT/salamandraTA-7B-instruct-GGUF
Run and chat with the model
lemonade run user.salamandraTA-7B-instruct-GGUF-{{QUANT_TAG}}List all available models
lemonade list
SalamandraTA-7B-instruct-GGUF Model Card
This model is the GGUF-quantized version of SalamandraTA-7b-instruct.
The model weights are quantized from FP16 to Q4_K_M quantization Q8_0 (8-bit quantization), (4-bit weights with K-means clustering quantization) and Q3_K_M (3-but weights with K-means clustering quantization) using the Llama.cpp framework. Inferencing with this model can be done using VLLM.
SalamandraTA-7b-instruct is a translation LLM that has been instruction-tuned from SalamandraTA-7b-base. The base model results from continually pre-training Salamandra-7b on parallel data and has not been published, but is reserved for internal use. SalamandraTA-7b-instruct is proficient in 35 European languages (plus 3 varieties) and supports translation-related tasks, namely: sentence-level-translation, paragraph-level-translation, document-level-translation, automatic post-editing, grammar checking, machine translation evaluation, alternative translations, named-entity-recognition and context-aware translation.
DISCLAIMER: This version of Salamandra is tailored exclusively for translation tasks. It lacks chat capabilities and has not been trained with any chat instructions.
The entire Salamandra family is released under a permissive Apache 2.0 license.
How to Use
The following example code works under Python 3.10.4, vllm==0.7.3, torch==2.5.1 and torchvision==0.20.1, though it should run on
any current version of the libraries. This is an example of translation using the model:
from huggingface_hub import snapshot_download
from vllm import LLM, SamplingParams
model_dir = snapshot_download(repo_id="BSC-LT/salamandraTA-7B-instruct-GGUF", revision="main")
model_name = "salamandraTA_7b_inst_q4.gguf"
llm = LLM(model=model_dir + '/' + model_name, tokenizer=model_dir)
source = "Spanish"
target = "English"
sentence = "Ayer se fue, tomó sus cosas y se puso a navegar. Una camisa, un pantalón vaquero y una canción, dónde irá, dónde irá. Se despidió, y decidió batirse en duelo con el mar. Y recorrer el mundo en su velero. Y navegar, nai-na-na, navegar."
prompt = f"Translate the following text from {source} into {target}.\\n{source}: {sentence} \\n{target}:"
messages = [{'role': 'user', 'content': prompt}]
outputs = llm.chat(messages,
sampling_params=SamplingParams(
temperature=0.1,
stop_token_ids=[5],
max_tokens=200)
)[0].outputs
print(outputs[0].text)
Additional information
Author
The Language Technologies Unit from Barcelona Supercomputing Center.
Contact
For further information, please send an email to langtech@bsc.es.
Copyright
Copyright(c) 2025 by Language Technologies Unit, Barcelona Supercomputing Center.
Funding
This work has been promoted and financed by the Government of Catalonia through the Aina Project.
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of ILENIA Project with reference 2022/TL22/00215337.
Acknowledgements
The success of this project has been made possible thanks to the invaluable contributions of our partners in the ILENIA Project: HiTZ, and CiTIUS. Their efforts have been instrumental in advancing our work, and we sincerely appreciate their help and support.
Disclaimer
Disclaimer
Be aware that the model may contain biases or other unintended distortions. When third parties deploy systems or provide services based on this model, or use the model themselves, they bear the responsibility for mitigating any associated risks and ensuring compliance with applicable regulations, including those governing the use of Artificial Intelligence.
The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
License
- Downloads last month
- 184
We're not able to determine the quantization variants.
