rl__24GPU_base__swe_rebench_patched_oracle__r2egym-nl2bash-stack
RL-trained Qwen3-8B (81 steps, GRPO/RLOO-N)
- Base: Qwen/Qwen3-8B
- W&B: https://wandb.ai/dogml/OpenThoughts-Agent/runs/lptlutqj
- Downloads last month
- 43
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support