LIA: Latent Image Animator

被引:3
作者
Wang, Yaohui [1 ]
Yang, Di [1 ]
Bremond, Francois [1 ]
Dantcheva, Antitza [1 ]
机构
[1] Univ Cote dAzur, Inria Ctr, 2004 Rte Lucioles, F-06902 Valbonne, France
关键词
Disentanglement; generative adversarial networks; image animation; interpretability; video generation;
D O I
10.1109/TPAMI.2024.3449075
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous animation techniques mainly focus on leveraging explicit structure representations (e.g., meshes or keypoints) for transferring motion from driving videos to source images. However, such methods are challenged with large appearance variations between source and driving data, as well as require complex additional modules to respectively model appearance and motion. Towards addressing these issues, we introduce the Latent Image Animator (LIA), streamlined to animate high-resolution images. LIA is designed as a simple autoencoder that does not rely on explicit representations. Motion transfer in the pixel space is modeled as linear navigation of motion codes in the latent space. Specifically such navigation is represented as an orthogonal motion dictionary learned in a self-supervised manner based on proposed Linear Motion Decomposition (LMD). Extensive experimental results demonstrate that LIA outperforms state-of-the-art on VoxCeleb, TaichiHD, and TED-talk datasets with respect to video quality and spatio-temporal consistency. In addition LIA is well equipped for zero-shot high-resolution image animation. Code, models, and demo video are available at https://github.com/wyhsirius/LIA.
引用
收藏
页码:10829 / 10844
页数:16
相关论文
共 50 条
[31]   3D Generative Model Latent Disentanglement via Local Eigenprojection [J].
Foti, Simone ;
Koo, Bongjin ;
Stoyanov, Danail ;
Clarkson, Matthew J. .
COMPUTER GRAPHICS FORUM, 2023, 42 (06)
[32]   ExtSwap: Leveraging Extended Latent Mapper for Generating High Quality Face Swapping [J].
Reddy, P. N. Aravinda ;
Rao, K. Sreenivasa ;
Ramachandra, Raghavendra ;
Mitra, Pabitra .
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I, 2024, 2009 :219-230
[33]   Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images [J].
Shimizu, Kye ;
Ienaga, Naoto ;
Takada, Kazuma ;
Sugimoto, Maki ;
Kasahara, Shunichi .
PROCEEDINGS OF THE ACM SYMPOSIUM ON APPLIED PERCEPTION, SAP 2022, 2022,
[34]   SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image Using Latent Video Diffusion [J].
Voleti, Vikram ;
Yao, Chun-Han ;
Boss, Mark ;
Letts, Adam ;
Pankratz, David ;
Tochilkin, Dmitry ;
Laforte, Christian ;
Rombach, Robin ;
Jampani, Varun .
COMPUTER VISION-ECCV 2024, PT I, 2025, 15059 :439-457
[35]   Controllable face image editing in a disentanglement way [J].
Zhou, Shiyan ;
Wang, Ke ;
Zhang, Jun ;
Xia, Yi ;
Chen, Peng ;
Wang, Bing .
JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (04)
[36]   Comparing the latent space of generative models [J].
Andrea Asperti ;
Valerio Tonelli .
Neural Computing and Applications, 2023, 35 :3155-3172
[37]   Effect of the Latent Structure on Clustering With GANs [J].
Mishra, Deepak ;
Jayendran, Aravind ;
Prathosh, A. P. .
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 :900-904
[38]   SEMANTIC UNFOLDING OF STYLEGAN LATENT SPACE [J].
Shukor, Mustafa ;
Yao, Xu ;
Damodaran, Bharath Bushan ;
Hellier, Pierre .
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, :221-225
[39]   Comparing the latent space of generative models [J].
Asperti, Andrea ;
Tonelli, Valerio .
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (04) :3155-3172
[40]   Generative image completion with image-to-image translation [J].
Shuzhen Xu ;
Qing Zhu ;
Jin Wang .
Neural Computing and Applications, 2020, 32 :7333-7345