paper to review
updated
VideoSwap: Customized Video Subject Swapping with Interactive Semantic
Point Correspondence
Paper
• 2312.02087
• Published
• 22
FaceStudio: Put Your Face Everywhere in Seconds
Paper
• 2312.02663
• Published
• 32
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper
• 2312.02432
• Published
• 14
ReconFusion: 3D Reconstruction with Diffusion Priors
Paper
• 2312.02981
• Published
• 10
ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation
Paper
• 2312.02201
• Published
• 35
Fine-grained Controllable Video Generation via Object Appearance and
Context
Paper
• 2312.02919
• Published
• 13
VideoBooth: Diffusion-based Video Generation with Image Prompts
Paper
• 2312.00777
• Published
• 24
MoMask: Generative Masked Modeling of 3D Human Motions
Paper
• 2312.00063
• Published
• 18
Make Pixels Dance: High-Dynamic Video Generation
Paper
• 2311.10982
• Published
• 68
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Paper
• 2312.03491
• Published
• 34
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Paper
• 2312.03818
• Published
• 34
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper
• 2312.03793
• Published
• 18
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper
• 2312.04461
• Published
• 62
Controllable Human-Object Interaction Synthesis
Paper
• 2312.03913
• Published
• 23
Photorealistic Video Generation with Diffusion Models
Paper
• 2312.06662
• Published
• 24
Context Tuning for Retrieval Augmented Generation
Paper
• 2312.05708
• Published
• 16
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper
• 2312.09911
• Published
• 55
DreamTalk: When Expressive Talking Head Generation Meets Diffusion
Probabilistic Models
Paper
• 2312.09767
• Published
• 27
Faithful Persona-based Conversational Dataset Generation with Large
Language Models
Paper
• 2312.10007
• Published
• 11
VecFusion: Vector Font Generation with Diffusion
Paper
• 2312.10540
• Published
• 22
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Paper
• 2312.11461
• Published
• 20
VolumeDiffusion: Flexible Text-to-3D Generation with Efficient
Volumetric Encoder
Paper
• 2312.11459
• Published
• 6
VidToMe: Video Token Merging for Zero-Shot Video Editing
Paper
• 2312.10656
• Published
• 11
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published
• 49
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Paper
• 2312.14125
• Published
• 47
Scalable Pre-training of Large Autoregressive Image Models
Paper
• 2401.08541
• Published
• 38
Aria Everyday Activities Dataset
Paper
• 2402.13349
• Published
• 31
SDXL-Lightning: Progressive Adversarial Diffusion Distillation
Paper
• 2402.13929
• Published
• 27
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper
• 2402.12226
• Published
• 45
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
• 2402.17177
• Published
• 88
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with
Audio2Video Diffusion Model under Weak Conditions
Paper
• 2402.17485
• Published
• 194
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion
Latent Aligners
Paper
• 2402.17723
• Published
• 16
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper
• 2403.03163
• Published
• 98
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
Diffusion Models
Paper
• 2403.03100
• Published
• 38
Personalized Audiobook Recommendations at Spotify Through Graph Neural
Networks
Paper
• 2403.05185
• Published
• 23
RAFT: Adapting Language Model to Domain Specific RAG
Paper
• 2403.10131
• Published
• 72
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for
Reconstructing Challenging Surfaces
Paper
• 2403.20275
• Published
• 10
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual
Observations
Paper
• 2404.04421
• Published
• 18
Audio Dialogues: Dialogues dataset for audio and music understanding
Paper
• 2404.07616
• Published
• 16
KAN: Kolmogorov-Arnold Networks
Paper
• 2404.19756
• Published
• 116
Layer-Condensed KV Cache for Efficient Inference of Large Language
Models
Paper
• 2405.10637
• Published
• 22