-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 124 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2509.08721
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 230 -
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper • 2601.08763 • Published • 148 -
Transformers in Reinforcement Learning: A Survey
Paper • 2307.05979 • Published • 1
-
deepseek-ai/DeepSeek-V3.2
Text Generation • 685B • Updated • 259k • • 1.27k -
Anthropic/AnthropicInterviewer
Viewer • Updated • 1.25k • 425 • 359 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
Qwen/Qwen-Image-Layered
Image-Text-to-Image • Updated • 20.9k • 1.01k
-
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 662 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 547 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 440 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 348
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 299 -
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper • 2511.14993 • Published • 231 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 547 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 509
-
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 18 -
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 662 -
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 299 -
Memory in the Age of AI Agents
Paper • 2512.13564 • Published • 151
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 124 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 230 -
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper • 2601.08763 • Published • 148 -
Transformers in Reinforcement Learning: A Survey
Paper • 2307.05979 • Published • 1
-
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 299 -
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper • 2511.14993 • Published • 231 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 547 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 509
-
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 18 -
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 662 -
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper • 2511.18538 • Published • 299 -
Memory in the Age of AI Agents
Paper • 2512.13564 • Published • 151
-
deepseek-ai/DeepSeek-V3.2
Text Generation • 685B • Updated • 259k • • 1.27k -
Anthropic/AnthropicInterviewer
Viewer • Updated • 1.25k • 425 • 359 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
Qwen/Qwen-Image-Layered
Image-Text-to-Image • Updated • 20.9k • 1.01k
-
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 662 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 547 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 440 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 348