-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2510.08377
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 19 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 23 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 190 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 136 -
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Paper • 2510.23607 • Published • 179 -
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Paper • 2510.08673 • Published • 126
-
UniVideo: Unified Understanding, Generation, and Editing for Videos
Paper • 2510.08377 • Published • 81 -
LongLive: Real-time Interactive Long Video Generation
Paper • 2509.22622 • Published • 188 -
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
Paper • 2509.08519 • Published • 128
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 106 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 30 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 27 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4
-
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Paper • 2412.07744 • Published • 20 -
Video Motion Transfer with Diffusion Transformers
Paper • 2412.07776 • Published • 17 -
ObjCtrl-2.5D: Training-free Object Control with Camera Poses
Paper • 2412.07721 • Published • 9 -
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance
Paper • 2412.05355 • Published • 8
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 190 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 136 -
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Paper • 2510.23607 • Published • 179 -
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Paper • 2510.08673 • Published • 126
-
UniVideo: Unified Understanding, Generation, and Editing for Videos
Paper • 2510.08377 • Published • 81 -
LongLive: Real-time Interactive Long Video Generation
Paper • 2509.22622 • Published • 188 -
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
Paper • 2509.08519 • Published • 128
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 19 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 106 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 30 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 27 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 23 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39
-
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Paper • 2412.07744 • Published • 20 -
Video Motion Transfer with Diffusion Transformers
Paper • 2412.07776 • Published • 17 -
ObjCtrl-2.5D: Training-free Object Control with Camera Poses
Paper • 2412.07721 • Published • 9 -
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance
Paper • 2412.05355 • Published • 8