Argonne 3.0-instruct

Argonne 3.0-instruct is a 2.88B-parameter instruction-tuned language model from the Argonne 3.x family. It is the SFT+DPO finetuned version of Argonne 3.0-base, trained on UltraChat (SFT) and KatoHF Chatbot Arena (DPO) datasets.

The base model was pretrained on ~76B tokens of FineWeb text at 1,024 context length. The instruct variant extends context to 13,568 tokens via RoPE extrapolation (ฮธ = 1,000,000) and is trained for instruction following, dialogue, and multi-turn conversation.

Model architecture

Component Specification
Parameters 2,882,162,688 (~2.88B)
Layers 24 transformer blocks
Hidden size 3,072
Attention heads 12 query / 4 key-value (GQA)
Head dimension 256
Feed-forward SwiGLU MLP, 8,192 intermediate dim
Attention pattern Interleaved local/global causal attention
Local attention window 256 tokens (every other layer)
Normalization RMSNorm with QK / V / sandwich norms
Position encoding RoPE (ฮธ = 1,000,000)
Logit stabilization Final logit softcap = 15.0
Context length 13,568 tokens (RoPE extrapolated from 1,024-ctx base)
Vocabulary size 151,669
Tied embeddings Yes (input โ†” output)

Training details

Stage 1 โ€” Supervised Fine-Tuning (SFT)

Item Value
Script sft.py
Dataset HuggingFaceH4/ultrachat_200k
Dataset recipe sft_ultrachat (system + user/assistant turns)
Context length 13,568 tokens
Batch size per GPU 10
Gradient accumulation 2
Effective batch 271,360 tokens/step
Optimizer AdamW (ฮฒโ‚=0.9, ฮฒโ‚‚=0.95, weight decay 0.1)
Peak learning rate 2.0e-5
Min LR ratio 0.1
Schedule Warmup-Stable-Decay; 200 warmup steps
Total optimizer steps 10,500
Epochs 1
Checkpoint cadence 30 minutes (time-based, save_total_limit=4)
Hardware 1ร— NVIDIA H200 GPU
Random seed 42

Stage 2 โ€” Direct Preference Optimization (DPO)

Item Value
Script dpo.py
Dataset KatoHF/chatbot_arena_binarized
Dataset recipe chat_refine_strict
Context length 13,568 tokens
Batch size per GPU 4
Gradient accumulation 2
Optimizer AdamW
Peak learning rate 1.0e-6
Beta (DPO temperature) 0.03
Score mode avg
Checkpoint cadence 30 minutes (time-based, save_total_limit=4)
Hardware 1ร— NVIDIA H200 GPU
Random seed 42

Training data

Item Value
SFT corpus UltraChat 200k โ€” multi-turn instruction-response pairs; see HuggingFaceH4/ultrachat_200k
DPO corpus KatoHF Chatbot Arena โ€” binarized preference pairs from real user comparisons; see KatoHF/chatbot_arena_binarized
Tokenizer Qwen/Qwen3-0.6B-Base (151,669-token vocab), reused from the base model

Tokenizer

This model reuses the Qwen3 tokenizer (vocabulary size 151,669) through the Qwen2Tokenizer compatibility class. The tokenizer files are bundled with the checkpoint so no extra download is required.

Source code

Built from the GitHub main branch: https://github.com/PursuitOfDataScience/ArgonneAI/tree/main

Key scripts used to produce this checkpoint:

  • model.py โ€” the ArgonneCausalLM / ArgonneConfig architecture (bundled here as model.py)
  • sft.py โ€” supervised fine-tuning loop
  • dpo.py โ€” DPO preference optimization loop

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "PursuitOfDataScience/argonne-3.0-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    dtype=torch.bfloat16,
)

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()

messages = [
    {"role": "user", "content": "Explain what a black hole is in a way a 10-year-old would understand."}
]
prompt_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
)
input_ids = torch.tensor([prompt_ids], dtype=torch.long, device=device)

seed = 444
torch.manual_seed(seed)
if device.startswith("cuda"):
    torch.cuda.manual_seed_all(seed)

output_ids = model.generate(
    input_ids,
    max_length=input_ids.shape[1] + 200,
    temperature=0.8,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.3,
    no_repeat_ngram_size=4,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

Recommended inference settings

Parameter Value
Context length 13,568 tokens
Temperature 0.8
Top-p 0.9
Repetition penalty 1.3
No-repeat n-gram size 4
Seed 444
Continuation length 200 new tokens

Usage notes

  • Load with trust_remote_code=True so the custom ArgonneCausalLM / ArgonneConfig classes (model.py) are registered.
  • Use apply_chat_template() for instruction prompts; the model ships with a Jinja2 chat template in tokenizer_config.json.
  • The custom generate method on ArgonneCausalLM uses max_length (total sequence length) rather than max_new_tokens; see the snippet above for the recommended pattern.
  • Weights are published as bf16 safetensor shards with a model.safetensors.index.json weight map for sharded loading.
  • The published context length is 13,568 tokens (RoPE extrapolated from the 1,024-ctx base).

Limitations

  • 2.88B parameters โ€” significantly smaller than frontier models; expect weaker performance on complex reasoning, math, and code tasks.
  • Context length extended via RoPE extrapolation; long-context performance may degrade on tasks requiring precise retrieval beyond the original 1,024-ctx pretraining distribution.
  • SFT trained on UltraChat (English-only, curated conversation data); limited multilingual capability.
  • DPO trained on Chatbot Arena preference data; alignment quality depends on the preference dataset coverage.
  • No safety filtering or content moderation has been applied.

Citation

@misc{argonne30instruct,
  author = {PursuitOfDataScience},
  title = {Argonne 3.0-instruct},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/PursuitOfDataScience/argonne-3.0-instruct}
}
Downloads last month
-
Safetensors
Model size
3B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support