Monocular Expressive Body Regression Through Body-Driven Attention

被引：171

作者：

Choutas, Vasileios ^{[1
,2
]}

Pavlakos, Georgios ^{[3
]}

Bolkart, Timo ^{[1
]}

Tzionas, Dimitrios ^{[1
]}

Black, Michael J. ^{[1
]}

机构：

[1] Max Planck Inst Intelligent Syst, Tubingen, Germany

[2] Max Planck ETH Ctr Learning Syst, Tubingen, Germany

[3] Univ Penn, Philadelphia, PA USA

来源：

COMPUTER VISION - ECCV 2020, PT X | 2020年 / 12355卷

关键词：

HAND POSE ESTIMATION; 3D; SHAPE; TRACKING; CAPTURE; PEOPLE; MODEL;

D O I：

10.1007/978-3-030-58607-2_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To understand how people look, interact, or perform tasks, we need to quickly and accurately capture their 3D body, face, and hands together from an RGB image. Most existing methods focus only on parts of the body. A few recent approaches reconstruct full expressive 3D humans from images using 3D body models that include the face and hands. These methods are optimization-based and thus slow, prone to local optima, and require 2D keypoints as input. We address these limitations by introducing ExPose (EXpressive POse and Shape rEgression), which directly regresses the body, face, and hands, in SMPL-X format, from an RGB image. This is a hard problem due to the high dimensionality of the body and the lack of expressive training data. Additionally, hands and faces are much smaller than the body, occupying very few image pixels. This makes hand and face estimation hard when body images are downscaled for neural networks. We make three main contributions. First, we account for the lack of training data by curating a dataset of SMPL-X fits on in-the-wild images. Second, we observe that body estimation localizes the face and hands reasonably well. We introduce body-driven attention for face and hand regions in the original image to extract higher-resolution crops that are fed to dedicated refinement modules. Third, these modules exploit part-specific knowledge from existing face- and hand-only datasets. ExPose estimates expressive 3D humans more accurately than existing optimization methods at a small fraction of the computational cost. Our data, model and code are available for research at https://expose.is.tue.mpg.de.

引用

页码：20 / 40

页数：21

共 106 条

[1] Recovering 3D human pose from monocular images [J].

Agarwal, A ;

Triggs, B .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (01) :44-58

[2]

Akhter I, 2015, PROC CVPR IEEE, P1446, DOI 10.1109/CVPR.2015.7298751

[3] 2D Human Pose Estimation: New Benchmark and State of the Art Analysis [J].

Andriluka, Mykhaylo ;

Pishchulin, Leonid ;

Gehler, Peter ;

Schiele, Bernt .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3686-3693

[4] SCAPE: Shape Completion and Animation of People [J].

Anguelov, D ;

Srinivasan, P ;

Koller, D ;

Thrun, S ;

Rodgers, J ;

Davis, J .

ACM TRANSACTIONS ON GRAPHICS, 2005, 24 (03) :408-416

[5] Pushing the Envelope for RGB-based Dense 3D Hand Pose Estimation via Neural Rendering [J].

Baek, Seungryul ;

Kim, Kwang In ;

Kim, Tae-Kyun .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1067-1076

[6] A morphable model for the synthesis of 3D faces [J].

Blanz, V ;

Vetter, T .

SIGGRAPH 99 CONFERENCE PROCEEDINGS, 1999, :187-194

[7] Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image [J].

Bogo, Federica ;

Kanazawa, Angjoo ;

Lassner, Christoph ;

Gehler, Peter ;

Romero, Javier ;

Black, Michael J. .

COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :561-578

[8] 3D Hand Shape and Pose from Images in the Wild [J].

Boukhayma, Adnane ;

de Bem, Rodrigo ;

Torr, Philip H. S. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10835-10844

[9] How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks) [J].

Bulat, Adrian ;

Tzimiropoulos, Georgios .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1021-1030

[10] OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields [J].

Cao, Zhe ;

Hidalgo, Gines ;

Simon, Tomas ;

Wei, Shih-En ;

Sheikh, Yaser .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) :172-186

← 1 2 3 4 5 6 7 8 9 10 →