Collections
Discover the best community collections!
Collections including paper arxiv:2509.08721
-
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 662 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 349 -
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 247 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 233
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 190 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 233 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 105 -
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 662
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 72 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 23 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 24 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 5
-
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
Paper • 2509.02544 • Published • 125 -
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 662 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 233 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 349
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 277 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 251 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 261
-
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning
Paper • 2508.18966 • Published • 56 -
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 662 -
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 75 -
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Paper • 2411.15466 • Published • 39
-
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 662 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 349 -
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 247 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 233
-
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
Paper • 2509.02544 • Published • 125 -
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 662 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 233 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 349
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 277 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 251 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 261
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 190 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 233 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 105 -
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 662
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 72 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 23 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 24 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 5
-
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning
Paper • 2508.18966 • Published • 56 -
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 662 -
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 75 -
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Paper • 2411.15466 • Published • 39