Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published 9 days ago • 87
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 10 days ago • 137
DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning Paper • 2605.25604 • Published 11 days ago • 134
Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving Paper • 2605.22809 • Published 15 days ago • 27
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published 18 days ago • 112
Video Analysis and Generation via a Semantic Progress Function Paper • 2604.22554 • Published Apr 24 • 63
HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System Paper • 2604.14125 • Published Apr 15 • 21
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published Apr 8 • 189
WorldAgents: Can Foundation Image Models be Agents for 3D World Models? Paper • 2603.19708 • Published Mar 20 • 13
3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model Paper • 2603.18524 • Published Mar 19 • 58
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models Paper • 2603.15618 • Published Mar 16 • 21
VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos Paper • 2510.19488 • Published Oct 22, 2025 • 22
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies Paper • 2603.04639 • Published Mar 4 • 31
TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics Paper • 2602.19313 • Published Feb 22 • 26
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published Jan 14 • 55