Instructions to use abacusai/Smaug-72B-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use abacusai/Smaug-72B-v0.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="abacusai/Smaug-72B-v0.1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("abacusai/Smaug-72B-v0.1") model = AutoModelForCausalLM.from_pretrained("abacusai/Smaug-72B-v0.1") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use abacusai/Smaug-72B-v0.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "abacusai/Smaug-72B-v0.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "abacusai/Smaug-72B-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/abacusai/Smaug-72B-v0.1
- SGLang
How to use abacusai/Smaug-72B-v0.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "abacusai/Smaug-72B-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "abacusai/Smaug-72B-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "abacusai/Smaug-72B-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "abacusai/Smaug-72B-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use abacusai/Smaug-72B-v0.1 with Docker Model Runner:
docker model run hf.co/abacusai/Smaug-72B-v0.1
Congratulations!
Congratulations! Average 80.48
@LoneStriker would definitely love to try an exl2 quant of this, better if you can make a 8.0bpw one.
First model to reach 80%!
@LoneStriker would definitely love to try an exl2 quant of this, better if you can make a 8.0bpw one.
Qwen 72B is not yet supported by exl2. I'll quantize the model if/when it is supported; I've been wanting to run it with exl2 myself since it came out...
Nice!
This model's derived from Qwen-72B, so take the scores with a grain of salt. Qwen is one of those base models that likely included test data in their pretraining, so apply a handicap to other models for a fair comparison.
Regardless, thanks for sharing this new model @ArkaAbacus and team @abacusai! :)
If you've the spare compute to take requests / challenges, I'm very curious to see if your training method can improve upon https://huggingface.co/allenai/tulu-2-dpo-70b, a Llama-2-70b type model, for a more direct comparison of efficacy in pushing the envelope.
@Ont Qwen-72 is doing really good on EQ bench which is definitely not the result of training on test data.
https://eqbench.com/
Just ran the fresh correlations to Arena Elo and EQ looks really promising.
Spearman Correlations:
EQ-bench v2: 0.863
MT-bench: 0.891
Alpaca v2: 0.899
Kendall's Tau:
EQ-bench v2: 0.730
MT-bench: 0.759
Alpaca v2: 0.759
Now does this mean that the base model does well on everything? Definitely not, but it shows that it's not simply a number gymnastics model. Although whoever tried Qwen knows this already probably.
(Also notice the lot of dolphins up there on that leaderboard. I don't know how much contribution @ehartford had to this model, but Qwen + the marine biologist guy looks like a good combination to me).
@LoneStriker would definitely love to try an exl2 quant of this, better if you can make a 8.0bpw one.
Qwen 72B is not yet supported by exl2. I'll quantize the model if/when it is supported; I've been wanting to run it with exl2 myself since it came out...
I think this is a llamafied version. It just uses a different tokenizer, so it cannot be converted to gguf but possibly exl2?
ex2 quant fails unfortunately. Even with the llama.cpp GGUF conversion, I was able to get the model to convert, but the resulting GGUF file was not loadable for me, so I took my GGUF quants offline for now until I can figure out why it's not loading.
(Also notice the lot of dolphins up there on that leaderboard. I don't know how much contribution @ehartford had to this model, but Qwen + the marine biologist guy looks like a good combination to me).
This work is unrelated - led by @ArkaAbacus