RL4RLM: Training Native Recursive Language Models
Collection
LoRA adapters (Qwen3-1.7B) for training RLMs via RL. SFT, STaR, DPO, GRPO-v4. Code: github.com/pythonomar22/rl4rlm • 4 items • Updated
LoRA adapter for Qwen3-1.7B trained as a Recursive Language Model (RLM) — a model that writes Python code to decompose and solve long-context tasks via a persistent REPL environment.
Training Native Recursive Language Models — CS234 Final Project, Stanford University (Winter 2026)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B")
model = PeftModel.from_pretrained(base, "omar81939/rl4rlm-grpo-v4")
tokenizer = AutoTokenizer.from_pretrained("omar81939/rl4rlm-grpo-v4")
| Model | NIAH (100) | Multi-NIAH (24) | DocClassify (20) | Avg |
|---|---|---|---|---|
| Base | 72.0 | 38.3 | 80.3 | 63.5 |
| SFT | 90.0 | 57.9 | 82.4 | 76.8 |
| STaR | 87.0 | 58.4 | 83.4 | 76.3 |
| DPO | 83.0 | 87.9 | 82.6 | 84.5 |
| GRPO-v4 | 82.0 | 85.1 | 83.2 | 83.4 |