When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models Paper ⢠2602.10179 ⢠Published Feb 10 ⢠6
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models Paper ⢠2602.10179 ⢠Published Feb 10 ⢠6
Olaf-World: Orienting Latent Actions for Video World Modeling Paper ⢠2602.10104 ⢠Published Feb 10 ⢠27
Glance: Accelerating Diffusion Models with 1 Sample Paper ⢠2512.02899 ⢠Published Dec 2, 2025 ⢠30
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Paper ⢠2511.11434 ⢠Published Nov 14, 2025 ⢠47
š± Sailor2 Language Models Collection Sailing in South-East Asia with Inclusive Multilingual LLMs ⢠32 items ⢠Updated 20 days ago ⢠30
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper ⢠2511.02778 ⢠Published Nov 4, 2025 ⢠103
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback Paper ⢠2511.01678 ⢠Published Nov 3, 2025 ⢠38
From Charts to Code: A Hierarchical Benchmark for Multimodal Models Paper ⢠2510.17932 ⢠Published Oct 20, 2025 ⢠8
Paper2Video: Automatic Video Generation from Scientific Papers Paper ⢠2510.05096 ⢠Published Oct 6, 2025 ⢠120
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper ⢠2504.06148 ⢠Published Apr 8, 2025 ⢠13