Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published Apr 27 • 71
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward Paper • 2603.26599 • Published Mar 27 • 66
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming Paper • 2512.21338 • Published Dec 24, 2025 • 23
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory Paper • 2512.07802 • Published Dec 8, 2025 • 46
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published Dec 1, 2025 • 78
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation Paper • 2410.22489 • Published Oct 29, 2024 • 1
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model Paper • 2503.16282 • Published Mar 20, 2025 • 6