Qwen3-4B-tau2-sft1
SFT checkpoint for tau2-bench tool-use tasks, trained from Qwen/Qwen3-4B-Instruct-2507 using the Slime tau2 training cookbook.
Training summary
- Stage: SFT (supervised fine-tuning)
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Data: Jarrodbarnes/tau2-sft-seed-v3 (filtered, rejection-sampled trajectories)
- Code: https://github.com/THUDM/slime/tree/main/examples/tau-bench
- tau2-bench commit: 337326e62d8e0ca74c353b004a9c5d748e0ba914
Key hyperparameters (from examples/tau-bench/training_cookbook.md)
num_epoch=2global_batch_size=16rollout_batch_size=16rollout_max_response_len=4096max_tokens_per_gpu=12288lr=1e-5(cosine decay, warmup fraction 0.05)weight_decay=0.01loss_mask_type=qwen3loss_type=sft_loss
Training command is in examples/tau-bench/tau2/run_sft.sh.
Evaluation (tau2-bench test split)
- Metric: pass@1 (any-success over 1 attempt)
- Domains: airline, retail, telecom (test split, 100 tasks)
- User simulator:
gpt-4.1-mini - Settings:
TAU2_USE_COMPRESSED_PROMPTS=0,TAU2_MAX_STEPS=100 - Sampling:
num_samples=1,temperature=0.0,top_p=1.0,top_k=20
Results
- Overall pass@1: 0.40 (100 tasks)
- By domain:
- airline: 0.20 (20 tasks)
- retail: 0.60 (40 tasks)
- telecom: 0.30 (40 tasks)
Reproduce pass@1
# Start policy server
CUDA_VISIBLE_DEVICES=0,1 python3 -m sglang.launch_server \
--model-path Jarrodbarnes/Qwen3-4B-tau2-sft1 \
--host 0.0.0.0 --port 30000 --tp 2 --mem-fraction-static 0.70
# Eval
python3 examples/tau-bench/tau2/eval.py \
--hf-checkpoint Jarrodbarnes/Qwen3-4B-tau2-sft1 \
--sglang-url http://127.0.0.1:30000/generate \
--domains airline,retail,telecom --task-split test \
--num-samples 1 --temperature 0.0 --top-p 1.0 --top-k 20 \
--output "${TAU_BENCH_OUT_DIR}/tau2/eval/sft_pass1.json"
Notes
eval.pyreports pass@k. The official tau2-bench leaderboard uses pass^k.- Results are stochastic due to user-sim; expect small variance.
Intended use
Research and reproduction of tau2-bench tool-use training. Not intended for deployment without additional safety evaluation.
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Jarrodbarnes/Qwen3-4B-tau2-sft1
Base model
Qwen/Qwen3-4B-Instruct-2507