Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2603.08660

The Art of Efficient Reasoning: Data, Reward, and Optimization

Paper • 2602.20945 • Published Feb 24 • 7
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 53
Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

Paper • 2512.04359 • Published Dec 4, 2025
How Far Can Unsupervised RLVR Scale LLM Training?

Paper • 2603.08660 • Published Mar 9 • 58

Finetuning Strategies

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27, 2025 • 15
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Paper • 2507.21802 • Published Jul 29, 2025 • 19
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity

Paper • 2507.21848 • Published Jul 29, 2025 • 9
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161

Reinforcement learning

about 16 hours ago

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4, 2025 • 104
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25, 2025 • 75

My notification

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published Jan 21 • 21
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published Jan 22 • 53
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published Jan 22 • 55
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published Jan 16 • 30

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8, 2025 • 208 • 99
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 39
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30, 2025 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 107
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 80
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 43
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 45

The Art of Efficient Reasoning: Data, Reward, and Optimization

Paper • 2602.20945 • Published Feb 24 • 7
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 53
Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

Paper • 2512.04359 • Published Dec 4, 2025
How Far Can Unsupervised RLVR Scale LLM Training?

Paper • 2603.08660 • Published Mar 9 • 58

My notification

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published Jan 21 • 21
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published Jan 22 • 53
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published Jan 22 • 55
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published Jan 16 • 30

Finetuning Strategies

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27, 2025 • 15
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Paper • 2507.21802 • Published Jul 29, 2025 • 19
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity

Paper • 2507.21848 • Published Jul 29, 2025 • 9
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8, 2025 • 208 • 99
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 39
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30, 2025 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88

Reinforcement learning

about 16 hours ago

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4, 2025 • 104
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25, 2025 • 75

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 107
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 80
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 43
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 45

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs