Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.08377

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 17 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 152
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

rachfop/unclothe

Updated Sep 29, 2024 • 3
Running on Zero

MCP

16

Qwen Image Fast

🖼

16

Generate images in 8-steps
Qwen/Qwen-Image

Text-to-Image • Updated Aug 18, 2025 • 153k • • 2.39k
nvidia/ToolScale

Viewer • Updated Dec 17, 2025 • 4.06k • 445 • 188

VideoGeneration

UniVideo: Unified Understanding, Generation, and Editing for Videos

Paper • 2510.08377 • Published Oct 9, 2025 • 81
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation

Paper • 2510.01284 • Published Sep 30, 2025 • 37

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 19
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17, 2025 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17, 2025 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 189
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 43

A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 190
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Paper • 2508.02193 • Published Aug 4, 2025 • 136
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Paper • 2510.23607 • Published Oct 27, 2025 • 179
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Paper • 2510.08673 • Published Oct 9, 2025 • 126

Video Generation

UniVideo: Unified Understanding, Generation, and Editing for Videos

Paper • 2510.08377 • Published Oct 9, 2025 • 81
LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26, 2025 • 188
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10, 2025 • 128

Video Generation

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10, 2025 • 106
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Paper • 2506.08009 • Published Jun 9, 2025 • 30
Seeing Voices: Generating A-Roll Video from Audio with Mirage

Paper • 2506.08279 • Published Jun 9, 2025 • 27
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement

Paper • 2506.07848 • Published Jun 9, 2025 • 4

Video Generation Control-Style Transfer

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Paper • 2412.07744 • Published Dec 10, 2024 • 20
Video Motion Transfer with Diffusion Transformers

Paper • 2412.07776 • Published Dec 10, 2024 • 17
ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Paper • 2412.07721 • Published Dec 10, 2024 • 9
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance

Paper • 2412.05355 • Published Dec 6, 2024 • 8

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 17 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 152
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 189
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 43

rachfop/unclothe

Updated Sep 29, 2024 • 3
Running on Zero

MCP

16

Qwen Image Fast

🖼

16

Generate images in 8-steps
Qwen/Qwen-Image

Text-to-Image • Updated Aug 18, 2025 • 153k • • 2.39k
nvidia/ToolScale

Viewer • Updated Dec 17, 2025 • 4.06k • 445 • 188

A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 190
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Paper • 2508.02193 • Published Aug 4, 2025 • 136
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Paper • 2510.23607 • Published Oct 27, 2025 • 179
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Paper • 2510.08673 • Published Oct 9, 2025 • 126

VideoGeneration

UniVideo: Unified Understanding, Generation, and Editing for Videos

Paper • 2510.08377 • Published Oct 9, 2025 • 81
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation

Paper • 2510.01284 • Published Sep 30, 2025 • 37

Video Generation

UniVideo: Unified Understanding, Generation, and Editing for Videos

Paper • 2510.08377 • Published Oct 9, 2025 • 81
LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26, 2025 • 188
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10, 2025 • 128

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 19
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

Video Generation

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10, 2025 • 106
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Paper • 2506.08009 • Published Jun 9, 2025 • 30
Seeing Voices: Generating A-Roll Video from Audio with Mirage

Paper • 2506.08279 • Published Jun 9, 2025 • 27
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement

Paper • 2506.07848 • Published Jun 9, 2025 • 4

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17, 2025 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17, 2025 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

Video Generation Control-Style Transfer

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Paper • 2412.07744 • Published Dec 10, 2024 • 20
Video Motion Transfer with Diffusion Transformers

Paper • 2412.07776 • Published Dec 10, 2024 • 17
ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Paper • 2412.07721 • Published Dec 10, 2024 • 9
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance

Paper • 2412.05355 • Published Dec 6, 2024 • 8

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs