ToonTalker: Cross-Domain Face Reenactment

被引：4

作者：

Gong, Yuan ^{[1
]}

Zhang, Yong ^{[2
]}

Cun, Xiaodong ^{[2
]}

Yin, Fei ^{[1
]}

Fan, Yanbo ^{[2
]}

Wang, Xuan ^{[3
]}

Wu, Baoyuan ^{[4
]}

Yang, Yujiu ^{[1
]}

机构：

[1] Tsinghua Univ, Shenzhen Int Grad Sch, Beijing, Peoples R China

[2] Tencent AI Lab, Shanghai, Peoples R China

[3] Ant Grp, Hangzhou, Peoples R China

[4] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Sch Data Sci, CUHK Shenzhen, Shenzhen, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV51070.2023.00707

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We target cross-domain face reenactment in this paper, i.e., driving a cartoon image with the video of a real person and vice versa. Recently, many works have focused on one- shot talking face generation to drive a portrait with a real video, i.e., within-domain reenactment. Straight-forwardly applying those methods to cross-domain animation will cause inaccurate expression transfer, blur effects, and even apparent artifacts due to the domain shift between cartoon and real faces. Only a few works attempt to settle cross-domain face reenactment. The most related work AnimeCeleb [13] requires constructing a dataset with pose vector and cartoon image pairs by animating 3D characters, which makes it inapplicable anymore if no paired data is available. In this paper, we propose a novel method for cross-domain reenactment without paired data. Specifically, we propose a transformer-based framework to align the motions from different domains into a common latent space where motion transfer is conducted via latent code addition. Two domain-specific motion encoders and two learnable motion base memories are used to capture domain properties. A source query transformer and a driving one are exploited to project domain-specific motion to the canonical space. The edited motion is projected back to the domain of the source with a transformer. Moreover, since no paired data is provided, we propose a novel cross- domain training scheme using data from two domains with the designed analogy constraint. Besides, we contribute a cartoon dataset in Disney style. Extensive evaluations demonstrate the superiority of our method over competing methods.

引用

页码：7656 / 7666

页数：11

共 33 条

[1] [Anonymous], A style-based generator architecture for generative adversarial networks
[2] Recycle-GAN: Unsupervised Video Retargeting
Bansal, Aayush
Ma, Shugao
Ramanan, Deva
Sheikh, Yaser
[J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 122 - 138
[3] A morphable model for the synthesis of 3D faces
Blanz, V
Vetter, T
[J]. SIGGRAPH 99 CONFERENCE PROCEEDINGS, 1999, : 187 - 194
[4] Booth James, 2016 IEEE C COMP VIS, P5543
[5] Bounareli Stella, FINDING DIRECTIONS G
[6] Chen Zhuo, 2020 IEEE CVF C COMP, P13515
[7] ArcFace: Additive Angular Margin Loss for Deep Face Recognition
Deng, Jiankang
Guo, Jia
Xue, Niannan
Zafeiriou, Stefanos
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4685 - 4694
[8] Doukas Michail Christos, HEADGAN ONE SHOT NEU
[9] Drobyshev Nikita, MEGAPORTRAITS ONE SH
[10] Hensel M, 2017, ADV NEUR IN, V30

← 1 2 3 4 →