Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

被引:205
作者
Sanyal, Soubhik [1 ]
Bolkart, Timo [1 ]
Feng, Haiwen [1 ]
Black, Michael J. [1 ]
机构
[1] Max Planck Inst Intelligent Syst, Perceiving Syst Dept, Stuttgart, Germany
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00795
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The estimation of 3D face shape from a single image must be robust to variations in lighting, head pose, expression,facial hair makeup, and occlusions. Robustness requires a large training set of in-the-wild images, which by construction, lack ground truth 3D shape. To train a network without any 2D-to-3D supervision, we present RingNet, which learns to compute 3D face shape from a single image. Our key observation is that an individual's face shape is constant across images, regardless of expression,pose, lighting, etc. RingNet leverages multiple images of a person and automatically detected 2D face features. It uses a novel loss that encourages the face shape to be similar when the identity is the same and different for different people. We achieve invariance to expression by representing the face using the FLAME model. Once trained, our method takes a single image and outputs the parameters of FLAME, which can be readily animated. Additionally we create a new database of faces "not quite in-the-wild" (NoW) with 3D head scans and high-resolution images of the subjects in a wide variety of conditions. We evaluate publicly available methods and find that RingNet is more accurate than methods that use 3D supervision. The dataset, model, and results are available for research purposes at http://ringnet.is.tuebingen.mpg.de.
引用
收藏
页码:7755 / 7764
页数:10
相关论文
共 39 条
[1]  
Abadi M., 2015, P 12 USENIX S OPERAT
[2]   Extreme 3D Face Reconstruction: Seeing Through Occlusions [J].
Anh Tuan Tran ;
Hassner, Tal ;
Masi, Iacopo ;
Paz, Eran ;
Nirkin, Yuval ;
Medioni, Gerard .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3935-3944
[3]  
[Anonymous], 2016, P IEEE C COMPUTER VI, DOI DOI 10.1109/CVPR.2016.262
[4]  
Bas A., 2016, P AS C COMP VIS, P377, DOI DOI 10.1007/978-3-319-54427-4_28
[5]   A morphable model for the synthesis of 3D faces [J].
Blanz, V ;
Vetter, T .
SIGGRAPH 99 CONFERENCE PROCEEDINGS, 1999, :187-194
[6]   Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources [J].
Bulat, Adrian ;
Tzimiropoulos, Georgios .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3726-3734
[7]   Displaced Dynamic Expression Regression for Real-time Facial Tracking and Animation [J].
Cao, Chen ;
Hou, Qiming ;
Zhou, Kun .
ACM TRANSACTIONS ON GRAPHICS, 2014, 33 (04)
[8]   Human Pose Estimation with Iterative Error Feedback [J].
Carreira, Joao ;
Agrawal, Pulkit ;
Fragkiadaki, Katerina ;
Malik, Jitendra .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4733-4742
[9]   A 3D Morphable Model of Craniofacial Shape and Texture Variation [J].
Dai, Hang ;
Pears, Nick ;
Smith, William ;
Duncan, Christian .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3104-3112
[10]   Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network [J].
Feng, Yao ;
Wu, Fan ;
Shao, Xiaohu ;
Wang, Yanfeng ;
Zhou, Xi .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :557-574