Instructions to use PursuitOfDataScience/argonne-3.0-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PursuitOfDataScience/argonne-3.0-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PursuitOfDataScience/argonne-3.0-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("PursuitOfDataScience/argonne-3.0-instruct", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use PursuitOfDataScience/argonne-3.0-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PursuitOfDataScience/argonne-3.0-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PursuitOfDataScience/argonne-3.0-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PursuitOfDataScience/argonne-3.0-instruct

SGLang

How to use PursuitOfDataScience/argonne-3.0-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PursuitOfDataScience/argonne-3.0-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PursuitOfDataScience/argonne-3.0-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PursuitOfDataScience/argonne-3.0-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PursuitOfDataScience/argonne-3.0-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PursuitOfDataScience/argonne-3.0-instruct with Docker Model Runner:
```
docker model run hf.co/PursuitOfDataScience/argonne-3.0-instruct
```

Argonne 3.0-instruct

Argonne 3.0-instruct is a 2.88B-parameter instruction-tuned language model from the Argonne 3.x family. It is the SFT+DPO finetuned version of Argonne 3.0-base, trained on UltraChat (SFT) and KatoHF Chatbot Arena (DPO) datasets.

The base model was pretrained on ~76B tokens of FineWeb text at 1,024 context length. The instruct variant extends context to 13,568 tokens via RoPE extrapolation (θ = 1,000,000) and is trained for instruction following, dialogue, and multi-turn conversation.

Model architecture

Component	Specification
Parameters	2,882,162,688 (~2.88B)
Layers	24 transformer blocks
Hidden size	3,072
Attention heads	12 query / 4 key-value (GQA)
Head dimension	256
Feed-forward	SwiGLU MLP, 8,192 intermediate dim
Attention pattern	Interleaved local/global causal attention
Local attention window	256 tokens (every other layer)
Normalization	RMSNorm with QK / V / sandwich norms
Position encoding	RoPE (θ = 1,000,000)
Logit stabilization	Final logit softcap = 15.0
Context length	13,568 tokens (RoPE extrapolated from 1,024-ctx base)
Vocabulary size	151,669
Tied embeddings	Yes (input ↔ output)

Training details

Stage 1 — Supervised Fine-Tuning (SFT)

Item	Value
Script	`sft.py`
Dataset	HuggingFaceH4/ultrachat_200k
Dataset recipe	`sft_ultrachat` (system + user/assistant turns)
Context length	13,568 tokens
Batch size per GPU	10
Gradient accumulation	2
Effective batch	271,360 tokens/step
Optimizer	AdamW (β₁=0.9, β₂=0.95, weight decay 0.1)
Peak learning rate	2.0e-5
Min LR ratio	0.1
Schedule	Warmup-Stable-Decay; 200 warmup steps
Total optimizer steps	10,500
Epochs	1
Checkpoint cadence	30 minutes (time-based, `save_total_limit=4`)
Hardware	1× NVIDIA H200 GPU
Random seed	42

Stage 2 — Direct Preference Optimization (DPO)

Item	Value
Script	`dpo.py`
Dataset	KatoHF/chatbot_arena_binarized
Dataset recipe	`chat_refine_strict`
Context length	13,568 tokens
Batch size per GPU	4
Gradient accumulation	2
Optimizer	AdamW
Peak learning rate	1.0e-6
Beta (DPO temperature)	0.03
Score mode	`avg`
Checkpoint cadence	30 minutes (time-based, `save_total_limit=4`)
Hardware	1× NVIDIA H200 GPU
Random seed	42

Training data

Item	Value
SFT corpus	UltraChat 200k — multi-turn instruction-response pairs; see HuggingFaceH4/ultrachat_200k
DPO corpus	KatoHF Chatbot Arena — binarized preference pairs from real user comparisons; see KatoHF/chatbot_arena_binarized
Tokenizer	Qwen/Qwen3-0.6B-Base (151,669-token vocab), reused from the base model

Tokenizer

This model reuses the Qwen3 tokenizer (vocabulary size 151,669) through the Qwen2Tokenizer compatibility class. The tokenizer files are bundled with the checkpoint so no extra download is required.

Source code

Built from the GitHub main branch: https://github.com/PursuitOfDataScience/ArgonneAI/tree/main

Key scripts used to produce this checkpoint:

model.py — the ArgonneCausalLM / ArgonneConfig architecture (bundled here as model.py)
sft.py — supervised fine-tuning loop
dpo.py — DPO preference optimization loop

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "PursuitOfDataScience/argonne-3.0-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    dtype=torch.bfloat16,
)

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()

messages = [
    {"role": "user", "content": "Explain what a black hole is in a way a 10-year-old would understand."}
]
prompt_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
)
input_ids = torch.tensor([prompt_ids], dtype=torch.long, device=device)

seed = 444
torch.manual_seed(seed)
if device.startswith("cuda"):
    torch.cuda.manual_seed_all(seed)

output_ids = model.generate(
    input_ids,
    max_length=input_ids.shape[1] + 200,
    temperature=0.8,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.3,
    no_repeat_ngram_size=4,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

Recommended inference settings

Parameter	Value
Context length	13,568 tokens
Temperature	0.8
Top-p	0.9
Repetition penalty	1.3
No-repeat n-gram size	4
Seed	444
Continuation length	200 new tokens

Usage notes

Load with trust_remote_code=True so the custom ArgonneCausalLM / ArgonneConfig classes (model.py) are registered.
Use apply_chat_template() for instruction prompts; the model ships with a Jinja2 chat template in tokenizer_config.json.
The custom generate method on ArgonneCausalLM uses max_length (total sequence length) rather than max_new_tokens; see the snippet above for the recommended pattern.
Weights are published as bf16 safetensor shards with a model.safetensors.index.json weight map for sharded loading.
The published context length is 13,568 tokens (RoPE extrapolated from the 1,024-ctx base).

Limitations

2.88B parameters — significantly smaller than frontier models; expect weaker performance on complex reasoning, math, and code tasks.
Context length extended via RoPE extrapolation; long-context performance may degrade on tasks requiring precise retrieval beyond the original 1,024-ctx pretraining distribution.
SFT trained on UltraChat (English-only, curated conversation data); limited multilingual capability.
DPO trained on Chatbot Arena preference data; alignment quality depends on the preference dataset coverage.
No safety filtering or content moderation has been applied.

Citation

@misc{argonne30instruct,
  author = {PursuitOfDataScience},
  title = {Argonne 3.0-instruct},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/PursuitOfDataScience/argonne-3.0-instruct}
}

Downloads last month: -

Safetensors

Model size

3B params

Tensor type

BF16