Instructions to use sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16")
model = AutoModelForCausalLM.from_pretrained("sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16

SGLang

How to use sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16",
    max_seq_length=2048,
)

Docker Model Runner
How to use sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16 with Docker Model Runner:
```
docker model run hf.co/sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Uploaded model

Developed by: sandbox-ai
License: apache-2.0
Finetuned from model : unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

Evaluation Metrics

Task	Name	Description	Language	Metric	Task type
AQuAS	AQuAS	Abstractive Question-Answering in Spanish	ES	sas_encoder	Abstractive QA
ARC_ca	ARC_ca	Grade-school level science questions in Catalan	CA	acc	Multi choice QA
BEC2016eu	BEC2016eu	Basque Election Campaign 2016 Opinion Dataset	EU	f1	Sentiment Analysis
Belebele Glg	Belebele Glg	Reading Comprehension in Galician	GL	acc	Reading Comprehension
BertaQA	BertaQA	Trivia dataset with global and local questions about the Basque Country	EU	acc	Multi choice QA
BHTCv2	BHTCv2	Topic Classification of News Headlines in Basque	EU	f1	Classification, Topic Classification
caBREU	caBREU	Article Summarization in Catalan	CA	bleu	Summarization
CatalanQA	CatalanQA	Extractive QA in Catalan	CA	f1	Extractive QA
CatCoLA	CatCoLA	Linguistic Acceptability in Catalan	CA	mcc	Linguistic Acceptability
ClinDiagnosES	ClinDiagnosES	Diagnosis of clinical cases in Spanish	ES	sas_encoder	Open QA
ClinTreatES	ClinTreatES	Treatment for clinical cases in Spanish	ES	sas_encoder	Open QA
COPA_ca	COPA_ca	Choice Of Plausible Alternatives in Catalan	CA	acc	Reasoning
CoQCat	CoQCat	Conversational Question Answering in Catalan	CA	f1	Extractive QA
Crows Pairs Spanish	Crows Pairs Spanish	Bias evaluation using stereotypes	ES	pct_stereotype	Bias Detection
EpecKorrefBin	EpecKorrefBin	Coreference resolution in Basque	EU	acc	Coreference Resolution, Textual Entailment
EsCoLA	EsCoLA	Spanish Corpus of Linguistic Acceptability	ES	mcc	Linguistic Acceptability
EusExams	EusExams	Public Service examinations questions in Basque	EU	acc	Multi choice QA
EusProficiency	EusProficiency	C1-level proficiency questions in Basque	EU	acc	Multi choice QA
EusReading	EusReading	EGA exams reading comprehension in Basque	EU	acc	Multi choice QA
EusTrivia	EusTrivia	Trivia questions in Basque	EU	acc	Multi choice QA
Fake News ES	Fake News ES	Fake News Detection in Spanish	ES	acc	Classification
GalCoLA	GalCoLA	Galician Corpus of Linguistic Acceptability	GL	mcc	Linguistic Acceptability
HumorQA	HumorQA	White humour joke classification	ES	acc	Classification
MGSM_ca	MGSM_ca	Grade-school math problems in Catalan	CA	exact_match	Math Reasoning
MGSM_es	MGSM_es	Grade-school math problems in Spanish	ES	exact_match	Math Reasoning
MGSM_eu	MGSM_eu	Grade-school math problems in Basque	EU	exact_match	Math Reasoning
MGSM_gl	MGSM_gl	Grade-school math problems in Galician	GL	exact_match	Math Reasoning
NoticIA	NoticIA	A Clickbait Article Summarization Dataset in Spanish	ES	rouge1	Summarization
OffendES	OffendES	Clasificación de comentarios ofensivos en español	ES	acc	Classification
OpenBookQA_ca	OpenBookQA_ca	Multi-step reasoning QA in Catalan	CA	acc	Reasoning
OpenBookQA_gl	OpenBookQA_gl	Multi-step reasoning QA in Galician	GL	acc	Reasoning
Parafraseja	Parafraseja	Paraphrase identification in Catalan	CA	acc	Paraphrasing
ParafrasesGL	ParafrasesGL	Paraphrase identification in Galician	GL	acc	Paraphrasing
PAWS_ca	PAWS_ca	Paraphrase Adversaries from Word Scrambling in Catalan	CA	acc	Paraphrasing
PAWS-X_es	PAWS-X_es	Paraphrase Adversaries from Word Scrambling in Spanish	ES	acc	Paraphrasing
PAWS_gl	PAWS_gl	Paraphrase Adversaries from Word Scrambling in Galician	GL	acc	Paraphrasing
PIQA_ca	PIQA_ca	Physical Interaction QA in Catalan	CA	acc	Reasoning
QNLIeu	QNLIeu	Textual Entailment in Basque	EU	acc	NLI, Textual Entailment
RagQuAS	RagQuAS	Retrieval-Augmented-Generation and Question-Answering in Spanish	ES	sas_encoder	Abstractive QA
SIQA_ca	SIQA_ca	Social Interaction QA in Catalan	CA	acc	Reasoning
SpaLawEx	SpaLawEx	Spanish Law School Access Exams	ES	acc	Multi choice QA
SummarizationGL	SummarizationGL	Abstractive Summarization in Galician	GL	bleu	Summarization
TE-ca	TE-ca	Textual Entailment in Catalan	CA	acc	Textual Entailment
TELEIA	TELEIA	Test de Español como Lengua Extranjera para Inteligencia Artificial	ES	acc	Multi choice QA
VaxxStance	VaxxStance	Stance detection on the Antivaxxers movement	EU	f1	Sentiment Analysis, Stance Detection
WiCeu	WiCeu	Word sense disambiguation in Basque	EU	acc	Textual Entailment
WNLI_ca	WNLI_ca	Winograd-schema-type dataset in Catalan	CA	acc	NLI, Textual Entailment
WNLI ES	WNLI ES	Winograd-schema-type dataset in Spanish	ES	acc	NLI, Textual Entailment
XCOPA_eu	XCOPA_eu	Choice Of Plausible Alternatives in Basque	EU	acc	Reasoning
XNLI_ca	XNLI_ca	Cross-lingual Natural Language Inference in Catalan	CA	acc	NLI, Textual Entailment
XNLI_es	XNLI_es	Cross-lingual Natural Language Inference in Spanish	ES	acc	NLI
XNLI_eu	XNLI_eu	Cross-lingual Natural Language Inference in Basque	EU	acc	NLI, Textual Entailment
XQuAD_ca	XQuAD_ca	Cross-lingual Question Answering Dataset in Catalan	CA	f1	Extractive QA
XQuAD_es	XQuAD_es	Cross-lingual Question Answering Dataset in Spanish	ES	f1	Extractive QA
xStoryCloze_ca	xStoryCloze_ca	Narrative completion in Catalan	CA	acc	Reasoning
xStoryCloze_es	xStoryCloze_es	Narrative completion in Spanish	ES	acc	Reasoning
xStoryCloze_eu	xStoryCloze_eu	Narrative completion in Basque	EU	acc	Reasoning

Usage:

You can use the model using HuggingFace Transformers library with 2 or more 80GB GPUs (NVIDIA Ampere or newer) with at least 150GB of free disk space to accomodate the download.

This code has been tested on Transformers v4.44.0, torch v2.4.0 and 2 A100 80GB GPUs, but any setup that supports meta-llama/Llama-3.1-70B-Instruct should support this model as well. If you run into problems, you can consider doing pip install -U transformers.

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sandbox-ai/Llama-3.1-Tango-8b-Instruct-f16")

References(s):

TODO

Model Architecture:

Architecture Type: Transformer
Network Architecture: Llama 3.1

Input:

Input Type(s): Text
Input Format: String
Input Parameters: One Dimensional (1D)
Other Properties Related to Input: Max of 128k tokens

Output:

Output Type(s): Text
Output Format: String
Output Parameters: One Dimensional (1D)
Other Properties Related to Output: Max of 4k tokens

Training & Evaluation:

TODO

Dataset:

MessIRve: A Large-Scale Spanish Information Retrieval Dataset

spanish/-ir/messirve
** messi_mod-v0.0.2 tatakof/messi_mod-v0.0.2

Citation

@article{valentini2024messirve,
      title={MessIRve: A Large-Scale Spanish Information Retrieval Dataset}, 
      author={Francisco Valentini and Viviana Cotik and Damián Furman and Ivan Bercovich and Edgar Altszyler and Juan Manuel Pérez},
      year={2024},
      eprint={2409.05994},
      journal={arxiv:2409.05994},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.05994}, 
}

@misc{wang2024helpsteer2preferencecomplementingratingspreferences,
      title={HelpSteer2-Preference: Complementing Ratings with Preferences}, 
      author={Zhilin Wang and Alexander Bukharin and Olivier Delalleau and Daniel Egert and Gerald Shen and Jiaqi Zeng and Oleksii Kuchaiev and Yi Dong},
      year={2024},
      eprint={2410.01257},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2410.01257}, 
}

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)