Monocular 3D Pose Estimation via Pose Grammar and Data Augmentation

被引：34

作者：

Xu, Yuanlu ^{[1
]}

Wang, Wenguan ^{[1
]}

Liu, Tengyu ^{[1
]}

Liu, Xiaobai ^{[2
]}

Xie, Jianwen ^{[3
]}

Zhu, Song-Chun ^{[1
]}

机构：

[1] Univ Calif Los Angeles UCLA, Dept Comp Sci & Stat, Los Angeles, CA 90095 USA

[2] San Diego State Univ SDSU, Dept Comp Sci, San Diego, CA 92182 USA

[3] Baidu Res, Sunnyvale, CA 94089 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 10期

基金：

美国国家科学基金会;

关键词：

Three-dimensional displays; Grammar; Pose estimation; Cameras; Solid modeling; Training; Protocols; 3D pose estimation; dependency grammar; data augmentation; deep neural network; recurrent neural network; evaluation protocol; learning-by-synthesis;

D O I：

10.1109/TPAMI.2021.3087695

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation from a monocular RGB image. Our model takes estimated 2D pose as the input and learns a generalized 2D-3D mapping function to leverage into 3D pose. The proposed model consists of a base network which efficiently captures pose-aligned features and a hierarchy of Bi-directional RNNs (BRNNs) on the top to explicitly incorporate a set of knowledge regarding human body configuration (i.e., kinematics, symmetry, motor coordination). The proposed model thus enforces high-level constraints over human poses. In learning, we develop a data augmentation algorithm to further improve model robustness against appearance variations and cross-view generalization ability. We validate our method on public 3D human pose benchmarks and propose a new evaluation protocol working on cross-view setting to verify the generalization capability of different methods. We empirically observe that most state-of-the-art methods encounter difficulty under such setting while our method can well handle such challenges.

引用

页码：6327 / 6344

页数：18

共 99 条

[1]

Akhter I, 2015, PROC CVPR IEEE, P1446, DOI 10.1109/CVPR.2015.7298751

[2] 2D Human Pose Estimation: New Benchmark and State of the Art Analysis [J].

Andriluka, Mykhaylo ;

Pishchulin, Leonid ;

Gehler, Peter ;

Schiele, Bernt .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3686-3693

[3]

[Anonymous], 2016, P INT JOINT C ART IN

[4] Symmetry-driven accumulation of local features for human characterization and re-identification [J].

Bazzani, Loris ;

Cristani, Marco ;

Murino, Vittorio .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2013, 117 (02) :130-144

[5] 3D Pictorial Structures for Multiple Human Pose Estimation [J].

Belagiannis, Vasileios ;

Amin, Sikandar ;

Andriluka, Mykhaylo ;

Schiele, Bernt ;

Navab, Nassir ;

Ilic, Slobodan .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1669-1676

[6] Twin Gaussian Processes for Structured Prediction [J].

Bo, Liefeng ;

Sminchisescu, Cristian .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 87 (1-2) :28-52

[7] Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image [J].

Bogo, Federica ;

Kanazawa, Angjoo ;

Lassner, Christoph ;

Gehler, Peter ;

Romero, Javier ;

Black, Michael J. .

COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :561-578

[8] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].

Cao, Zhe ;

Simon, Tomas ;

Wei, Shih-En ;

Sheikh, Yaser .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310

[9] Personalizing Human Video Pose Estimation [J].

Charles, James ;

Pfister, Tomas ;

Magee, Derek ;

Hogg, David ;

Zisserman, Andrew .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3063-3072

[10] 3D Human Pose Estimation=2D Pose Estimation plus Matching [J].

Chen, Ching-Hang ;

Ramanan, Deva .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5759-5767

← 1 2 3 4 5 6 7 8 9 10 →