Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2603.15031

WTF GENIUS PAPERS

Papers that made me appreciate my major and my life a little more. obs=Observation, innov=Innovation. Most papers are abt improving tiny models.

Diffusion Language Models Know the Answer Before Decoding

Paper • 2508.19982 • Published Aug 27, 2025 • 27
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Paper • 2512.13586 • Published Dec 15, 2025 • 93
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Paper • 2601.06431 • Published Jan 10 • 12
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

Paper • 2601.09088 • Published Jan 14 • 63

Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180

Model_Architecture

Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180

Frontier Research Papers

Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

Paper • 2510.19338 • Published Oct 22, 2025 • 117
Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30, 2025 • 132

The Last Prism • Corpus

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180
Mixture-of-Depths Attention

Paper • 2603.15619 • Published Mar 16 • 80
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Paper • 2603.15557 • Published Mar 16 • 29

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

Paper • 2604.02268 • Published 17 days ago • 93
ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

Paper • 2603.05863 • Published Mar 6 • 6
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

Paper • 2604.02721 • Published 16 days ago • 361
GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 144

Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Paper • 2603.17187 • Published Mar 17 • 138
Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180
MOSS-TTS Technical Report

Paper • 2603.18090 • Published Mar 18 • 12
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Paper • 2603.23516 • Published Mar 6 • 48

Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published Mar 10 • 151
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Paper • 2603.12228 • Published Mar 12 • 12
Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 53
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs

Paper • 2410.16144 • Published Oct 21, 2024 • 5

WTF GENIUS PAPERS

Papers that made me appreciate my major and my life a little more. obs=Observation, innov=Innovation. Most papers are abt improving tiny models.

Diffusion Language Models Know the Answer Before Decoding

Paper • 2508.19982 • Published Aug 27, 2025 • 27
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Paper • 2512.13586 • Published Dec 15, 2025 • 93
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Paper • 2601.06431 • Published Jan 10 • 12
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

Paper • 2601.09088 • Published Jan 14 • 63

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

Paper • 2604.02268 • Published 17 days ago • 93
ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

Paper • 2603.05863 • Published Mar 6 • 6
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

Paper • 2604.02721 • Published 16 days ago • 361
GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 144

Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180

Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180

Model_Architecture

Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Paper • 2603.17187 • Published Mar 17 • 138
Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180
MOSS-TTS Technical Report

Paper • 2603.18090 • Published Mar 18 • 12
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Paper • 2603.23516 • Published Mar 6 • 48

Frontier Research Papers

Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

Paper • 2510.19338 • Published Oct 22, 2025 • 117
Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30, 2025 • 132

Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180

The Last Prism • Corpus

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 180
Mixture-of-Depths Attention

Paper • 2603.15619 • Published Mar 16 • 80
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Paper • 2603.15557 • Published Mar 16 • 29

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published Mar 10 • 151
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Paper • 2603.12228 • Published Mar 12 • 12
Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 53
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs

Paper • 2410.16144 • Published Oct 21, 2024 • 5

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs