Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

被引:983
作者
Pavlakos, Georgios [1 ,2 ]
Choutas, Vasileios [1 ]
Ghorbani, Nima [1 ]
Bolkart, Timo [1 ]
Osman, Ahmed A. A. [1 ]
Tzionas, Dimitrios [1 ]
Black, Michael J. [1 ]
机构
[1] MPI Intelligent Syst, Tubingen, Germany
[2] Univ Penn, Philadelphia, PA 19104 USA
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
MOTION CAPTURE; TRACKING; RECONSTRUCTION; SPACE; MODEL; POSE;
D O I
10.1109/CVPR.2019.01123
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data.
引用
收藏
页码:10967 / 10977
页数:11
相关论文
共 77 条
[21]  
Freifeld O, 2012, LECT NOTES COMPUT SC, V7572, P1, DOI 10.1007/978-3-642-33718-5_1
[22]  
Gall J, 2009, PROC CVPR IEEE, P1746, DOI 10.1109/CVPRW.2009.5206755
[23]  
Geman S., 1987, Bull. Internat. Statist. Inst., V52, P5
[24]   A Statistical Model of Human Pose and Body Shape [J].
Hasler, N. ;
Stoll, C. ;
Sunkel, M. ;
Rosenhahn, B. ;
Seidel, H. -P. .
COMPUTER GRAPHICS FORUM, 2009, 28 (02) :337-346
[25]  
Hasler Nils, 2010, P 2010 ACM SIGGRAPH, P23, DOI DOI 10.1145/1730804.1730809
[26]   Coregistration: Simultaneous Alignment and Modeling of Articulated 3D Shape [J].
Hirshberg, David A. ;
Loper, Matthew ;
Rachlin, Eric ;
Black, Michael J. .
COMPUTER VISION - ECCV 2012, PT VI, 2012, 7577 :242-255
[27]  
Hsu SY, 2014, IN C IND ENG ENG MAN, P6, DOI 10.1109/IEEM.2014.7058589
[28]  
Huang Y., 2017, 3DV
[29]   DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model [J].
Insafutdinov, Eldar ;
Pishchulin, Leonid ;
Andres, Bjoern ;
Andriluka, Mykhaylo ;
Schiele, Bernt .
COMPUTER VISION - ECCV 2016, PT VI, 2016, 9910 :34-50
[30]   Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments [J].
Ionescu, Catalin ;
Papava, Dragos ;
Olaru, Vlad ;
Sminchisescu, Cristian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (07) :1325-1339