Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2509.08721

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24, 2025 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27, 2025 • 31
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 124
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

Reinforcement Learning

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 230
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Paper • 2601.08763 • Published Jan 13 • 148
Transformers in Reinforcement Learning: A Survey

Paper • 2307.05979 • Published Jul 12, 2023 • 1
Comparing DPO with IPO and KTO

Collection

A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO. • 56 items • Updated Jan 8, 2025 • 32

collections TEST Org

collections TEST ORG

PrivateXXOrganization/jjjjjj

Updated Jan 2
Qwen/Qwen-Image-Layered

Image-Text-to-Image • Updated Dec 19, 2025 • 20.9k • 1.01k
zai-org/GLM-4.7

Text Generation • Updated 28 days ago • 120k • • 1.93k
bigai/TongSIM-Asset

Updated Dec 29, 2025 • 1.94k • 278

<h1>test</test>

jjjjjjj%24%5C%7B123*456%5C%7Djjjjjjj%3C%25%3D123*567%25%3Ejjjjjjj%5C%7B%5C%7B123*678%5C%7D%5C%7D

deepseek-ai/DeepSeek-V3.2

Text Generation • 685B • Updated Dec 1, 2025 • 259k • • 1.27k
Anthropic/AnthropicInterviewer

Viewer • Updated Jan 6 • 1.25k • 425 • 359
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
Qwen/Qwen-Image-Layered

Image-Text-to-Image • Updated Dec 19, 2025 • 20.9k • 1.01k

Research Article

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 662
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 547
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 440
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Paper • 2508.18106 • Published Aug 25, 2025 • 348

How to crack the CTET 2026 exam in 7 days?

How to crack the CTET 2026 exam in 7 days?

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 662

papers-most-view-by-month

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 299
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 231
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 547
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 509

TradingAgents: Multi-Agents LLM Financial Trading Framework

Paper • 2412.20138 • Published Dec 28, 2024 • 18
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 662
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 299
Memory in the Age of AI Agents

Paper • 2512.13564 • Published Dec 15, 2025 • 151

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 662

Project AI From Beginning to Ending

deepseek-ai/DeepSeek-OCR

Image-Text-to-Text • Updated Nov 4, 2025 • 3.25M • 3.17k
nvidia/PhysicalAI-Autonomous-Vehicles

Updated Jan 21 • 383k • 764
HuggingFaceFW/finewiki

Viewer • Updated Oct 22, 2025 • 61.6M • 6.28k • 285
MiniMaxAI/MiniMax-M2

Text Generation • Updated Dec 23, 2025 • 461k • • 1.48k

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24, 2025 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27, 2025 • 31
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 124
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

How to crack the CTET 2026 exam in 7 days?

How to crack the CTET 2026 exam in 7 days?

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 662

Reinforcement Learning

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 230
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Paper • 2601.08763 • Published Jan 13 • 148
Transformers in Reinforcement Learning: A Survey

Paper • 2307.05979 • Published Jul 12, 2023 • 1
Comparing DPO with IPO and KTO

Collection

A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO. • 56 items • Updated Jan 8, 2025 • 32

papers-most-view-by-month

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 299
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 231
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 547
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 509

collections TEST Org

collections TEST ORG

PrivateXXOrganization/jjjjjj

Updated Jan 2
Qwen/Qwen-Image-Layered

Image-Text-to-Image • Updated Dec 19, 2025 • 20.9k • 1.01k
zai-org/GLM-4.7

Text Generation • Updated 28 days ago • 120k • • 1.93k
bigai/TongSIM-Asset

Updated Dec 29, 2025 • 1.94k • 278

TradingAgents: Multi-Agents LLM Financial Trading Framework

Paper • 2412.20138 • Published Dec 28, 2024 • 18
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 662
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 299
Memory in the Age of AI Agents

Paper • 2512.13564 • Published Dec 15, 2025 • 151

<h1>test</test>

jjjjjjj%24%5C%7B123*456%5C%7Djjjjjjj%3C%25%3D123*567%25%3Ejjjjjjj%5C%7B%5C%7B123*678%5C%7D%5C%7D

deepseek-ai/DeepSeek-V3.2

Text Generation • 685B • Updated Dec 1, 2025 • 259k • • 1.27k
Anthropic/AnthropicInterviewer

Viewer • Updated Jan 6 • 1.25k • 425 • 359
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
Qwen/Qwen-Image-Layered

Image-Text-to-Image • Updated Dec 19, 2025 • 20.9k • 1.01k

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 662

Research Article

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 662
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 547
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 440
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Paper • 2508.18106 • Published Aug 25, 2025 • 348

Project AI From Beginning to Ending

deepseek-ai/DeepSeek-OCR

Image-Text-to-Text • Updated Nov 4, 2025 • 3.25M • 3.17k
nvidia/PhysicalAI-Autonomous-Vehicles

Updated Jan 21 • 383k • 764
HuggingFaceFW/finewiki

Viewer • Updated Oct 22, 2025 • 61.6M • 6.28k • 285
MiniMaxAI/MiniMax-M2

Text Generation • Updated Dec 23, 2025 • 461k • • 1.48k

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs