Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
•
2505.24726
•
Published
•
277
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
•
2506.01939
•
Published
•
188
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models
Paper
•
2505.24864
•
Published
•
143
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper
•
2505.24863
•
Published
•
97
Perception, Reason, Think, and Plan: A Survey on Large Multimodal
Reasoning Models
Paper
•
2505.04921
•
Published
•
186
Shifting AI Efficiency From Model-Centric to Data-Centric Compression
Paper
•
2505.19147
•
Published
•
145
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large
Reasoning Models
Paper
•
2505.10554
•
Published
•
120