VIOLA: Towards Video In-Context Learning with Minimal Annotations Paper • 2601.15549 • Published 7 days ago • 4
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published 7 days ago • 13
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models Paper • 2601.15224 • Published 8 days ago • 12
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published 12 days ago • 32
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience Paper • 2601.15876 • Published 7 days ago • 89
SOP: A Scalable Online Post-Training System for Vision-Language-Action Models Paper • 2601.03044 • Published 23 days ago • 28
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 8 days ago • 42
ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands Paper • 2512.24965 • Published 29 days ago • 41
NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published 25 days ago • 44
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation Paper • 2512.24271 • Published 30 days ago • 62
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields Paper • 2601.03252 • Published 23 days ago • 100
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow Paper • 2512.24766 • Published 29 days ago • 9
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents Paper • 2512.22047 • Published Dec 26, 2025 • 29
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper • 2512.15603 • Published Dec 17, 2025 • 63