Instructions to use PursuitOfDataScience/argonne-3.0-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use PursuitOfDataScience/argonne-3.0-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="PursuitOfDataScience/argonne-3.0-instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("PursuitOfDataScience/argonne-3.0-instruct", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use PursuitOfDataScience/argonne-3.0-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "PursuitOfDataScience/argonne-3.0-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PursuitOfDataScience/argonne-3.0-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/PursuitOfDataScience/argonne-3.0-instruct
- SGLang
How to use PursuitOfDataScience/argonne-3.0-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "PursuitOfDataScience/argonne-3.0-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PursuitOfDataScience/argonne-3.0-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "PursuitOfDataScience/argonne-3.0-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PursuitOfDataScience/argonne-3.0-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use PursuitOfDataScience/argonne-3.0-instruct with Docker Model Runner:
docker model run hf.co/PursuitOfDataScience/argonne-3.0-instruct
Argonne 3.0-instruct
Argonne 3.0-instruct is a 2.88B-parameter instruction-tuned language model from the Argonne 3.x family. It is the SFT+DPO finetuned version of Argonne 3.0-base, trained on UltraChat (SFT) and KatoHF Chatbot Arena (DPO) datasets.
The base model was pretrained on ~76B tokens of FineWeb text at 1,024 context length. The instruct variant extends context to 13,568 tokens via RoPE extrapolation (ฮธ = 1,000,000) and is trained for instruction following, dialogue, and multi-turn conversation.
Model architecture
| Component | Specification |
|---|---|
| Parameters | 2,882,162,688 (~2.88B) |
| Layers | 24 transformer blocks |
| Hidden size | 3,072 |
| Attention heads | 12 query / 4 key-value (GQA) |
| Head dimension | 256 |
| Feed-forward | SwiGLU MLP, 8,192 intermediate dim |
| Attention pattern | Interleaved local/global causal attention |
| Local attention window | 256 tokens (every other layer) |
| Normalization | RMSNorm with QK / V / sandwich norms |
| Position encoding | RoPE (ฮธ = 1,000,000) |
| Logit stabilization | Final logit softcap = 15.0 |
| Context length | 13,568 tokens (RoPE extrapolated from 1,024-ctx base) |
| Vocabulary size | 151,669 |
| Tied embeddings | Yes (input โ output) |
Training details
Stage 1 โ Supervised Fine-Tuning (SFT)
| Item | Value |
|---|---|
| Script | sft.py |
| Dataset | HuggingFaceH4/ultrachat_200k |
| Dataset recipe | sft_ultrachat (system + user/assistant turns) |
| Context length | 13,568 tokens |
| Batch size per GPU | 10 |
| Gradient accumulation | 2 |
| Effective batch | 271,360 tokens/step |
| Optimizer | AdamW (ฮฒโ=0.9, ฮฒโ=0.95, weight decay 0.1) |
| Peak learning rate | 2.0e-5 |
| Min LR ratio | 0.1 |
| Schedule | Warmup-Stable-Decay; 200 warmup steps |
| Total optimizer steps | 10,500 |
| Epochs | 1 |
| Checkpoint cadence | 30 minutes (time-based, save_total_limit=4) |
| Hardware | 1ร NVIDIA H200 GPU |
| Random seed | 42 |
Stage 2 โ Direct Preference Optimization (DPO)
| Item | Value |
|---|---|
| Script | dpo.py |
| Dataset | KatoHF/chatbot_arena_binarized |
| Dataset recipe | chat_refine_strict |
| Context length | 13,568 tokens |
| Batch size per GPU | 4 |
| Gradient accumulation | 2 |
| Optimizer | AdamW |
| Peak learning rate | 1.0e-6 |
| Beta (DPO temperature) | 0.03 |
| Score mode | avg |
| Checkpoint cadence | 30 minutes (time-based, save_total_limit=4) |
| Hardware | 1ร NVIDIA H200 GPU |
| Random seed | 42 |
Training data
| Item | Value |
|---|---|
| SFT corpus | UltraChat 200k โ multi-turn instruction-response pairs; see HuggingFaceH4/ultrachat_200k |
| DPO corpus | KatoHF Chatbot Arena โ binarized preference pairs from real user comparisons; see KatoHF/chatbot_arena_binarized |
| Tokenizer | Qwen/Qwen3-0.6B-Base (151,669-token vocab), reused from the base model |
Tokenizer
This model reuses the Qwen3 tokenizer (vocabulary size 151,669) through the Qwen2Tokenizer compatibility class. The tokenizer files are bundled with the checkpoint so no extra download is required.
Source code
Built from the GitHub main branch: https://github.com/PursuitOfDataScience/ArgonneAI/tree/main
Key scripts used to produce this checkpoint:
model.pyโ theArgonneCausalLM/ArgonneConfigarchitecture (bundled here asmodel.py)sft.pyโ supervised fine-tuning loopdpo.pyโ DPO preference optimization loop
Inference
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "PursuitOfDataScience/argonne-3.0-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
dtype=torch.bfloat16,
)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()
messages = [
{"role": "user", "content": "Explain what a black hole is in a way a 10-year-old would understand."}
]
prompt_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
)
input_ids = torch.tensor([prompt_ids], dtype=torch.long, device=device)
seed = 444
torch.manual_seed(seed)
if device.startswith("cuda"):
torch.cuda.manual_seed_all(seed)
output_ids = model.generate(
input_ids,
max_length=input_ids.shape[1] + 200,
temperature=0.8,
top_p=0.9,
do_sample=True,
repetition_penalty=1.3,
no_repeat_ngram_size=4,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
Recommended inference settings
| Parameter | Value |
|---|---|
| Context length | 13,568 tokens |
| Temperature | 0.8 |
| Top-p | 0.9 |
| Repetition penalty | 1.3 |
| No-repeat n-gram size | 4 |
| Seed | 444 |
| Continuation length | 200 new tokens |
Usage notes
- Load with
trust_remote_code=Trueso the customArgonneCausalLM/ArgonneConfigclasses (model.py) are registered. - Use
apply_chat_template()for instruction prompts; the model ships with a Jinja2 chat template intokenizer_config.json. - The custom
generatemethod onArgonneCausalLMusesmax_length(total sequence length) rather thanmax_new_tokens; see the snippet above for the recommended pattern. - Weights are published as bf16 safetensor shards with a
model.safetensors.index.jsonweight map for sharded loading. - The published context length is 13,568 tokens (RoPE extrapolated from the 1,024-ctx base).
Limitations
- 2.88B parameters โ significantly smaller than frontier models; expect weaker performance on complex reasoning, math, and code tasks.
- Context length extended via RoPE extrapolation; long-context performance may degrade on tasks requiring precise retrieval beyond the original 1,024-ctx pretraining distribution.
- SFT trained on UltraChat (English-only, curated conversation data); limited multilingual capability.
- DPO trained on Chatbot Arena preference data; alignment quality depends on the preference dataset coverage.
- No safety filtering or content moderation has been applied.
Citation
@misc{argonne30instruct,
author = {PursuitOfDataScience},
title = {Argonne 3.0-instruct},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/PursuitOfDataScience/argonne-3.0-instruct}
}
- Downloads last month
- -