ToonTalker: Cross-Domain Face Reenactment

被引:4
作者
Gong, Yuan [1 ]
Zhang, Yong [2 ]
Cun, Xiaodong [2 ]
Yin, Fei [1 ]
Fan, Yanbo [2 ]
Wang, Xuan [3 ]
Wu, Baoyuan [4 ]
Yang, Yujiu [1 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] Tencent AI Lab, Shanghai, Peoples R China
[3] Ant Grp, Hangzhou, Peoples R China
[4] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Sch Data Sci, CUHK Shenzhen, Shenzhen, Peoples R China
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV51070.2023.00707
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We target cross-domain face reenactment in this paper, i.e., driving a cartoon image with the video of a real person and vice versa. Recently, many works have focused on one- shot talking face generation to drive a portrait with a real video, i.e., within-domain reenactment. Straight-forwardly applying those methods to cross-domain animation will cause inaccurate expression transfer, blur effects, and even apparent artifacts due to the domain shift between cartoon and real faces. Only a few works attempt to settle cross-domain face reenactment. The most related work AnimeCeleb [13] requires constructing a dataset with pose vector and cartoon image pairs by animating 3D characters, which makes it inapplicable anymore if no paired data is available. In this paper, we propose a novel method for cross-domain reenactment without paired data. Specifically, we propose a transformer-based framework to align the motions from different domains into a common latent space where motion transfer is conducted via latent code addition. Two domain-specific motion encoders and two learnable motion base memories are used to capture domain properties. A source query transformer and a driving one are exploited to project domain-specific motion to the canonical space. The edited motion is projected back to the domain of the source with a transformer. Moreover, since no paired data is provided, we propose a novel cross- domain training scheme using data from two domains with the designed analogy constraint. Besides, we contribute a cartoon dataset in Disney style. Extensive evaluations demonstrate the superiority of our method over competing methods.
引用
收藏
页码:7656 / 7666
页数:11
相关论文
共 33 条
  • [1] [Anonymous], A style-based generator architecture for generative adversarial networks
  • [2] Recycle-GAN: Unsupervised Video Retargeting
    Bansal, Aayush
    Ma, Shugao
    Ramanan, Deva
    Sheikh, Yaser
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 122 - 138
  • [3] A morphable model for the synthesis of 3D faces
    Blanz, V
    Vetter, T
    [J]. SIGGRAPH 99 CONFERENCE PROCEEDINGS, 1999, : 187 - 194
  • [4] Booth James, 2016 IEEE C COMP VIS, P5543
  • [5] Bounareli Stella, FINDING DIRECTIONS G
  • [6] Chen Zhuo, 2020 IEEE CVF C COMP, P13515
  • [7] ArcFace: Additive Angular Margin Loss for Deep Face Recognition
    Deng, Jiankang
    Guo, Jia
    Xue, Niannan
    Zafeiriou, Stefanos
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4685 - 4694
  • [8] Doukas Michail Christos, HEADGAN ONE SHOT NEU
  • [9] Drobyshev Nikita, MEGAPORTRAITS ONE SH
  • [10] Hensel M, 2017, ADV NEUR IN, V30