Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 16 days ago • 87
UltraData Collection Ultra Scale, Ultra Quality, Ultra Coverage • 10 items • Updated 13 days ago • 81
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation Paper • 2509.24663 • Published Sep 29, 2025 • 16