Instructions to use Qwen/Qwen2-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Qwen/Qwen2-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Qwen/Qwen2-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-7B")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Qwen/Qwen2-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Qwen/Qwen2-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen2-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Qwen/Qwen2-7B

SGLang

How to use Qwen/Qwen2-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Qwen/Qwen2-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen2-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Qwen/Qwen2-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen2-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Qwen/Qwen2-7B with Docker Model Runner:
```
docker model run hf.co/Qwen/Qwen2-7B
```

Losin94 commited on Jun 6, 2024

Commit

0546a58

verified ·

1 Parent(s): 9b039ab

Update README.md

Browse files

Files changed (1) hide show

README.md +51 -0

README.md CHANGED Viewed

@@ -34,6 +34,57 @@ KeyError: 'qwen2'
 We do not advise you to use base language models for text generation. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model.
 ## Citation
 If you find our work helpful, feel free to give us a cite.

 We do not advise you to use base language models for text generation. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model.
+### Performance
+The evaluation of base models mainly focuses on the model performance of natural language understanding, general question answering, coding, mathematics, scientific knowledge, reasoning, multilingual capability, etc.
+The datasets for evaluation include:
+**English Tasks**: MMLU (5-shot), MMLU-Pro (5-shot), GPQA (5shot), Theorem QA (5-shot), BBH (3-shot), HellaSwag (10-shot), Winogrande (5-shot), TruthfulQA (0-shot), ARC-C (25-shot)
+**Coding Tasks**: EvalPlus (0-shot) (HumanEval, MBPP, HumanEval+, MBPP+), MultiPL-E (0-shot) (Python, C++, JAVA, PHP, TypeScript, C#, Bash, JavaScript)
+**Math Tasks**: GSM8K (4-shot), MATH (4-shot)
+**Chinese Tasks**: C-Eval(5-shot), CMMLU (5-shot)
+**Multilingual Tasks**: Multi-Exam (M3Exam 5-shot, IndoMMLU 3-shot, ruMMLU 5-shot, mMMLU 5-shot), Multi-Understanding (BELEBELE 5-shot, XCOPA 5-shot, XWinograd 5-shot, XStoryCloze 0-shot, PAWS-X 5-shot), Multi-Mathematics (MGSM 8-shot), Multi-Translation (Flores-101 5-shot)
+#### Qwen2-7B performance
+|  Datasets  |  Mistral-7B  |   Gemma-7B |   Llama-3-8B  |   Qwen1.5-7B  |  Qwen2-7B  |
+| :--------| :---------: | :------------: | :------------: | :------------: | :------------: |
+|# Params | 7.2B | 8.5B | 8.0B | 7.7B | 7.6B  |
+|# Non-emb Params | 7.0B | 7.8B | 7.0B | 6.5B | 6.5B |
+|   ***English***  |    |    |   |    |	    |
+|MMLU | 64.2 | 64.6 | 66.6 | 61.0 | **70.3** |
+|MMLU-Pro | 30.9 | 33.7 | 35.4 | 29.9 | **40.0** |
+|GPQA | 24.7 | 25.7 | 25.8 | 26.7 | **31.8** |
+|Theorem QA | 19.2 | 21.5 | 22.1 | 14.2 | **31.1** |
+|BBH  | 56.1 |  55.1  | 57.7 | 40.2 | **62.6** |
+|HellaSwag  | **83.2** |  82.2  | 82.1 | 78.5 | 80.7 |
+|Winogrande  | 78.4 |  **79.0**  | 77.4 |  71.3 |  77.0 |
+|ARC-C  | 60.0 |  **61.1**  | 59.3 | 54.2 |  60.6 |
+|TruthfulQA  | 42.2 |  44.8  | 44.0 | 51.1 |  **54.2** |
+|   ***Coding***  |    |    |   |    |	    |
+|HumanEval | 29.3 | 37.2 | 33.5 | 36.0 | **51.2**  |
+|MBPP | 51.1 | 50.6 | 53.9 | 51.6 | **65.9**  |
+|EvalPlus | 36.4 | 39.6 | 40.3 | 40.0 | **54.2**  |
+|MultiPL-E | 29.4 | 29.7 | 22.6 | 28.1 | **46.3**  |
+|   ***Mathematics***  |    |    |   |    |	    |
+|GSM8K | 52.2 |  46.4  | 56.0 | 62.5 | **79.9** |
+|MATH  | 13.1 |  24.3  | 20.5 | 20.3 | **44.2** |
+|   ***Chinese***  |    |    |   |    |	    |
+|C-Eval   | 47.4 |   43.6    |  49.5 |  74.1 |  **83.2** |
+|CMMLU   | - |   -    | 50.8 | 73.1 | **83.9** |
+|   ***Multilingual***  |    |    |   |    |	    |
+|Multi-Exam   | 47.1 |   42.7    |  52.3 |  47.7 |  **59.2** |
+|Multi-Understanding | 63.3 |  58.3    |  68.6 |  67.6 |  **72.0** |
+|Multi-Mathematics | 26.3 |   39.1    |  36.3 |  37.3 |  **57.5** |
+|Multi-Translation | 23.3 |   31.2    |  **31.9** |  28.4 |  31.5 |
 ## Citation
 If you find our work helpful, feel free to give us a cite.