Training Data Efficiency in Multimodal Process Reward Models Paper • 2602.04145 • Published 6 days ago • 74
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs Paper • 2602.03048 • Published 7 days ago • 33
Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation Paper • 2602.03619 • Published 6 days ago • 24
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing Paper • 2602.03845 • Published 6 days ago • 24
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing Paper • 2602.03845 • Published 6 days ago • 24
TTCS: Test-Time Curriculum Synthesis for Self-Evolving Paper • 2601.22628 • Published 10 days ago • 33