Instructions to use PrimeIntellect/INTELLECT-3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PrimeIntellect/INTELLECT-3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PrimeIntellect/INTELLECT-3", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PrimeIntellect/INTELLECT-3", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("PrimeIntellect/INTELLECT-3", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use PrimeIntellect/INTELLECT-3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PrimeIntellect/INTELLECT-3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PrimeIntellect/INTELLECT-3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PrimeIntellect/INTELLECT-3

SGLang

How to use PrimeIntellect/INTELLECT-3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PrimeIntellect/INTELLECT-3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PrimeIntellect/INTELLECT-3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PrimeIntellect/INTELLECT-3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PrimeIntellect/INTELLECT-3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PrimeIntellect/INTELLECT-3 with Docker Model Runner:
```
docker model run hf.co/PrimeIntellect/INTELLECT-3
```

INTELLECT-3 / README.md

rasdani

Update README.md

ff39d4a verified 6 months ago

preview code

raw

history blame contribute delete

3.63 kB

	---
	library_name: transformers
	tags:
	- prime-rl
	- verifiers
	- prime-intellect
	- reinforcement-learning
	- reasoning
	- agentic
	- mixture-of-experts
	license: mit
	language:
	- en
	base_model:
	- zai-org/GLM-4.5-Air-Base
	pipeline_tag: text-generation
	---

	# INTELLECT-3

	<div align="center">
	<img src="banner.png" alt="Prime Intellect Logo" />
	</div>

	<p align="center">
	<strong>INTELLECT-3: A 100B+ MoE trained with large-scale RL</strong>
	<br><br>
	Trained with <a href="https://github.com/PrimeIntellect-ai/prime-rl">prime-rl</a> and <a href="https://github.com/PrimeIntellect-ai/verifiers">verifiers</a>
	<br>
	Environments released on <a href="https://app.primeintellect.ai/dashboard/environments">Environments Hub</a>
	<br>
	Read the <a href="https://primeintellect.ai/blog/intellect-3">Blog</a> & <a href="https://storage.googleapis.com/intellect-3-paper/INTELLECT_3_Technical_Report.pdf">Technical Report</a>
	<br>
	<a href="https://x.com/primeintellect">X</a> \| <a href="https://discord.gg/RC5GvMbfDf">Discord</a> \| <a href="https://app.primeintellect.ai/dashboard/create-cluster">Prime Intellect Platform</a>
	</p>

	## Introduction

	INTELLECT-3 is a 106B (A12B) parameter Mixture-of-Experts reasoning model post-trained from [GLM-4.5-Air-Base](https://huggingface.co/zai-org/GLM-4.5-Air-Base) using supervised fine-tuning (SFT) followed by large-scale reinforcement learning (RL).

	![bench](bench.png)

	Training was performed with [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl) using environments built with the [verifiers](https://github.com/PrimeIntellect-ai/verifiers) library.
	All training and evaluation environments are available on the [Environments Hub](https://app.primeintellect.ai/dashboard/environments).

	The model, training frameworks, and environments are open-sourced under fully-permissive licenses (MIT and Apache 2.0).

	For more details, see the [technical report](https://storage.googleapis.com/intellect-3-paper/INTELLECT_3_Technical_Report.pdf).

	## Evaluation

	INTELLECT-3 achieves best-in-class performance on math, coding, and reasoning benchmarks:

	\| Benchmark \| MATH-500 \| AIME24 \| AIME25 \| LCB \| GPQA \| HLE \| MMLU-Pro \|
	\|-----------\|----------\|---------\|---------\|--------\|------\|-----\|----------\|
	\| INTELLECT-3 \| 98.1 \| 90.8 \| 88.0 \| 69.3 \| 74.4 \| 14.6 \| 81.9 \|
	\| GLM-4.5-Air \| 97.8 \| 84.6 \| 82.0 \| 61.5 \| 73.3 \| 13.3 \| 73.9 \|
	\| GLM-4.5 \| 97.0 \| 85.8 \| 83.3 \| 64.5 \| 77.0 \| 14.8 \| 83.5 \|
	\| DeepSeek R1 0528 \| 87.3 \| 83.2 \| 73.4 \| 62.5 \| 77.5 \| 15.9 \| 75.3 \|
	\| DeepSeek v3.2 \| 96.8 \| 88.1 \| 84.7 \| 71.6 \| 81.4 \| 17.9 \| 84.6 \|
	\| GPT-O5S 120B \| 96.0 \| 75.8 \| 77.7 \| 69.9 \| 70.0 \| 10.6 \| 67.1 \|

	## Model Variants

	\| Model \| HuggingFace \|
	\|-------\|-------------\|
	\| INTELLECT-3 \| [PrimeIntellect/INTELLECT-3](https://huggingface.co/PrimeIntellect/INTELLECT-3) \|
	\| INTELLECT-3-FP8 \| [PrimeIntellect/INTELLECT-3-FP8](https://huggingface.co/PrimeIntellect/INTELLECT-3-FP8) \|

	## Serving with vLLM

	The BF16 version can be served on 2x H200s:
	```bash
	vllm serve PrimeIntellect/INTELLECT-3 \
	--tensor-parallel-size 2 \
	--enable-auto-tool-choice \
	--tool-call-parser qwen3_coder \
	--reasoning-parser deepseek_r1
	```

	The FP8 version can be served on a single H200:

	```bash
	vllm serve PrimeIntellect/INTELLECT-3-FP8 \
	--enable-auto-tool-choice \
	--tool-call-parser qwen3_coder \
	--reasoning-parser deepseek_r1
	```

	## Citation

	```bibtex
	@misc{intellect3,
	title={INTELLECT-3: Technical Report},
	author={Prime Intellect Team},
	year={2025},
	url={https://huggingface.co/PrimeIntellect/INTELLECT-3}
	}
	```