Video Understanding - a Henry665 Collection

Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Henry665 's Collections

Video Understanding

Video Understanding

updated 4 days ago

AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Paper • 2604.04184 • Published 22 days ago • 50
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14, 2024 • 15
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Paper • 2506.01300 • Published Jun 2, 2025
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

Paper • 2503.21782 • Published Mar 27, 2025
VideoNSA: Native Sparse Attention Scales Video Understanding

Paper • 2510.02295 • Published Oct 2, 2025 • 10
VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

Paper • 2603.22285 • Published Mar 23 • 49
AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding

Paper • 2603.28696 • Published 27 days ago • 6
Dense Video Understanding with Gated Residual Tokenization

Paper • 2509.14199 • Published Sep 17, 2025 • 3
PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

Paper • 2512.04025 • Published Dec 3, 2025 • 4
EventMemAgent: Hierarchical Event-Centric Memory for Online Video Understanding with Adaptive Tool Use

Paper • 2602.15329 • Published Feb 17 • 1
Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory

Paper • 2602.18434 • Published Feb 20
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

Paper • 2501.13468 • Published Jan 23, 2025
Video Inference for Human Mesh Recovery with Vision Transformer

Paper • 2507.08981 • Published Jul 11, 2025 • 1
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Paper • 2501.12375 • Published Jan 21, 2025 • 23
V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval

Paper • 2512.12284 • Published Dec 13, 2025
HoliTom: Holistic Token Merging for Fast Video Large Language Models

Paper • 2505.21334 • Published May 27, 2025 • 21
Accurate and Fast Compressed Video Captioning

Paper • 2309.12867 • Published Sep 22, 2023
Unleashing Hour-Scale Video Training for Long Video-Language Understanding

Paper • 2506.05332 • Published Jun 5, 2025 • 3
Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning

Paper • 2601.21037 • Published Jan 28 • 15
Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models

Paper • 2603.02872 • Published Mar 3 • 1
Efficient Video Sampling: Pruning Temporally Redundant Tokens for Faster VLM Inference

Paper • 2510.14624 • Published Oct 16, 2025 • 2
Rethinking Chain-of-Thought Reasoning for Videos

Paper • 2512.09616 • Published Dec 10, 2025 • 19
Inference Compute-Optimal Video Vision Language Models

Paper • 2505.18855 • Published May 24, 2025
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

Paper • 2402.03161 • Published Feb 5, 2024 • 16
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning

Paper • 2510.15440 • Published Oct 17, 2025
When Thinking Drifts: Evidential Grounding for Robust Video Reasoning

Paper • 2510.06077 • Published Oct 7, 2025
VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding

Paper • 2412.02186 • Published Dec 3, 2024 • 23
VIDEOP2R: Video Understanding from Perception to Reasoning

Paper • 2511.11113 • Published Nov 14, 2025 • 112
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

Paper • 2505.16175 • Published May 22, 2025 • 42
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

Paper • 2507.15028 • Published Jul 20, 2025 • 21
Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding

Paper • 2508.20478 • Published Aug 28, 2025 • 18
Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?

Paper • 2505.14321 • Published May 20, 2025 • 11
Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding

Paper • 2512.05774 • Published Dec 5, 2025 • 7
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning

Paper • 2509.24786 • Published Sep 29, 2025 • 7
Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method

Paper • 2501.00584 • Published Dec 31, 2024
DATE: Dynamic Absolute Time Enhancement for Long Video Understanding

Paper • 2509.09263 • Published Sep 11, 2025
Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos

Paper • 2408.14023 • Published Aug 26, 2024
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published 21 days ago • 234
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 46
Vidi2: Large Multimodal Models for Video Understanding and Creation

Paper • 2511.19529 • Published Nov 24, 2025 • 2
Contribution-aware Token Compression for Efficient Video Understanding via Reinforcement Learning

Paper • 2602.01649 • Published Feb 2
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6, 2025 • 96
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 74
Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding

Paper • 2508.21496 • Published Aug 29, 2025 • 55
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Paper • 2406.14515 • Published Jun 20, 2024 • 33
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Paper • 2501.08326 • Published Jan 14, 2025 • 34
Vidi: Large Multimodal Models for Video Understanding and Editing

Paper • 2504.15681 • Published Apr 22, 2025 • 14
SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation

Paper • 2505.08665 • Published May 13, 2025 • 5
Visual Context Window Extension: A New Perspective for Long Video Understanding

Paper • 2409.20018 • Published Sep 30, 2024 • 11
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

Paper • 2411.02327 • Published Nov 4, 2024 • 11
VideoAds for Fast-Paced Video Understanding: Where Opensource Foundation Models Beat GPT-4o & Gemini-1.5 Pro

Paper • 2504.09282 • Published Apr 12, 2025
VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting

Paper • 2603.14659 • Published Mar 15 • 6
MOSS-ChatV: Reinforcement Learning with Process Reasoning Reward for Video Temporal Reasoning

Paper • 2509.21113 • Published Sep 25, 2025 • 6
Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models

Paper • 2511.23478 • Published Nov 28, 2025
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23, 2025 • 56
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Paper • 2501.05510 • Published Jan 9, 2025 • 44
Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

Paper • 2512.00891 • Published Nov 30, 2025 • 16
A Simple Baseline for Streaming Video Understanding

Paper • 2604.02317 • Published 25 days ago • 72
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval

Paper • 2503.00540 • Published Mar 1, 2025 • 3
StreamChat: Chatting with Streaming Video

Paper • 2412.08646 • Published Dec 11, 2024 • 18
Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models

Paper • 2603.11896 • Published Mar 12 • 10
CurveStream: Boosting Streaming Video Understanding in MLLMs via Curvature-Aware Hierarchical Visual Memory Management

Paper • 2603.19571 • Published Mar 20 • 2
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

Paper • 2411.03628 • Published Nov 6, 2024 • 2
VideoScaffold: Elastic-Scale Visual Hierarchies for Streaming Video Understanding in MLLMs

Paper • 2512.22226 • Published Dec 23, 2025
Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs

Paper • 2510.17364 • Published Oct 20, 2025
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding

Paper • 2504.13915 • Published Apr 10, 2025
VideoLLM-online: Online Video Large Language Model for Streaming Video

Paper • 2406.11816 • Published Jun 17, 2024 • 26
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

Paper • 2408.16730 • Published Aug 29, 2024
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Paper • 2504.17343 • Published Apr 24, 2025 • 13
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Paper • 2601.14724 • Published Jan 21 • 75
Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video

Paper • 2510.14560 • Published Oct 16, 2025

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs