Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published Feb 2 • 140
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 214 • 7
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 214
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning Paper • 2505.12448 • Published May 18, 2025 • 10
Unicorn: Text-Only Data Synthesis for Vision Language Model Training Paper • 2503.22655 • Published Mar 28, 2025 • 38
BELLE-2/Belle-whisper-large-v3-turbo-zh Automatic Speech Recognition • 0.8B • Updated Dec 16, 2024 • 425 • 75
Running on Zero MCP 2.84k Background Removal 🌘 2.84k Remove image backgrounds and get transparent PNGs
Runtime error Agents 13 Dit Document Layout Analysis 👀 13 Analyze document layout by uploading images
Running Agents 25 Document Layout Analysis 🐠 25 Segment document layouts into text, images, and tables
Sleeping Agents 7 Document Layout Comparison 🔥 7 Analyze document layout to identify text and elements