Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality

被引:2
作者
Jourabloo, Amin [1 ]
De la Torre, Fernando [2 ]
Saragih, Jason [1 ]
Wei, Shih-En [1 ]
Lombardi, Stephen [1 ]
Wang, Te-Li [1 ]
Belko, Danielle [1 ]
Trimble, Autumn [1 ]
Badino, Hernan [1 ]
机构
[1] Facebook Real Labs, Pittsburgh, PA 15222 USA
[2] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年
关键词
D O I
10.1109/CVPR52688.2022.01968
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Social presence, the feeling of being there with a "real" person, will fuel the next generation of communication systems driven by digital humans in virtual reality (VR). The best 3D video-realistic VR avatars that minimize the uncanny effect rely on person-specific (PS) models. However, these PS models are time-consuming to build and are typically trained with limited data variability, which results in poor generalization and robustness. Major sources of variability that affects the accuracy of facial expression transfer algorithms include using different VR headsets (e.g., camera configuration, slop of the headset), facial appearance changes over time (e.g., beard, make-up), and environmental factors (e.g., lighting, backgrounds). This is a major drawback for the scalability of these models in VR. This paper makes progress in overcoming these limitations by proposing an end-to-end multi-identity architecture (MIA) trained with specialized augmentation strategies. MIA drives the shape component of the avatar from three cameras in the VR headset (two eyes, one mouth), in untrained subjects, using minimal personalized information (i.e., neutral 3D mesh shape). Similarly, if the PS texture decoder is available, MIA is able to drive the full avatar (shape+texture) robustly outperforming PS models in challenging scenarios. Our key contribution to improve robustness and generalization, is that our method implicitly decouples, in an unsupervised manner, the facial expression from nuisance factors (e.g., headset, environment, facial appearance). We demonstrate the superior performance and robustness of the proposed method versus state-of-the-art PS approaches in a variety of experiments.
引用
收藏
页码:20291 / 20300
页数:10
相关论文
共 53 条
  • [1] [Anonymous], 2020, INSECTES
  • [2] [Anonymous], COMPUT GRAPH FORUM
  • [3] [Anonymous], 2001, EUR C COMP VIS
  • [4] Bing X., 2015, ARXIV150500853
  • [5] FaceWarehouse: A 3D Facial Expression Database for Visual Computing
    Cao, Chen
    Weng, Yanlin
    Zhou, Shun
    Tong, Yiying
    Zhou, Kun
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2014, 20 (03) : 413 - 425
  • [6] Partially Shared Multi-Task Convolutional Neural Network with Local Constraint for Face Attribute Learning
    Cao, Jiajiong
    Li, Yingming
    Zhang, Zhongfei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4290 - 4299
  • [7] Semantic Deep Face Models
    Chandran, Prashanth
    Bradley, Derek
    Gross, Markus
    Beeler, Thabo
    [J]. 2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 345 - 354
  • [8] Chaudhuri Bindita, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12350), P142, DOI 10.1007/978-3-030-58558-7_9
  • [9] Joint Face Detection and Facial Motion Retargeting for Multiple Faces
    Chaudhuri, Bindita
    Vesdapunt, Noranart
    Wang, Baoyuan
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9711 - 9720
  • [10] High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation
    Chen, Lele
    Cao, Chen
    De la Torre, Fernando
    Saragih, Jason
    Xu, Chenliang
    Sheikh, Yaser
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13054 - 13064