view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
view article Article Reinforcement Learning for Large Language Models: Beyond the Agent Paradigm royswastik • Mar 19, 2025 • 8