Monocular 3D Pose Estimation via Pose Grammar and Data Augmentation

被引:34
作者
Xu, Yuanlu [1 ]
Wang, Wenguan [1 ]
Liu, Tengyu [1 ]
Liu, Xiaobai [2 ]
Xie, Jianwen [3 ]
Zhu, Song-Chun [1 ]
机构
[1] Univ Calif Los Angeles UCLA, Dept Comp Sci & Stat, Los Angeles, CA 90095 USA
[2] San Diego State Univ SDSU, Dept Comp Sci, San Diego, CA 92182 USA
[3] Baidu Res, Sunnyvale, CA 94089 USA
基金
美国国家科学基金会;
关键词
Three-dimensional displays; Grammar; Pose estimation; Cameras; Solid modeling; Training; Protocols; 3D pose estimation; dependency grammar; data augmentation; deep neural network; recurrent neural network; evaluation protocol; learning-by-synthesis;
D O I
10.1109/TPAMI.2021.3087695
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation from a monocular RGB image. Our model takes estimated 2D pose as the input and learns a generalized 2D-3D mapping function to leverage into 3D pose. The proposed model consists of a base network which efficiently captures pose-aligned features and a hierarchy of Bi-directional RNNs (BRNNs) on the top to explicitly incorporate a set of knowledge regarding human body configuration (i.e., kinematics, symmetry, motor coordination). The proposed model thus enforces high-level constraints over human poses. In learning, we develop a data augmentation algorithm to further improve model robustness against appearance variations and cross-view generalization ability. We validate our method on public 3D human pose benchmarks and propose a new evaluation protocol working on cross-view setting to verify the generalization capability of different methods. We empirically observe that most state-of-the-art methods encounter difficulty under such setting while our method can well handle such challenges.
引用
收藏
页码:6327 / 6344
页数:18
相关论文
共 99 条
[1]  
Akhter I, 2015, PROC CVPR IEEE, P1446, DOI 10.1109/CVPR.2015.7298751
[2]   2D Human Pose Estimation: New Benchmark and State of the Art Analysis [J].
Andriluka, Mykhaylo ;
Pishchulin, Leonid ;
Gehler, Peter ;
Schiele, Bernt .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3686-3693
[3]  
[Anonymous], 2016, P INT JOINT C ART IN
[4]   Symmetry-driven accumulation of local features for human characterization and re-identification [J].
Bazzani, Loris ;
Cristani, Marco ;
Murino, Vittorio .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2013, 117 (02) :130-144
[5]   3D Pictorial Structures for Multiple Human Pose Estimation [J].
Belagiannis, Vasileios ;
Amin, Sikandar ;
Andriluka, Mykhaylo ;
Schiele, Bernt ;
Navab, Nassir ;
Ilic, Slobodan .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1669-1676
[6]   Twin Gaussian Processes for Structured Prediction [J].
Bo, Liefeng ;
Sminchisescu, Cristian .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 87 (1-2) :28-52
[7]   Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image [J].
Bogo, Federica ;
Kanazawa, Angjoo ;
Lassner, Christoph ;
Gehler, Peter ;
Romero, Javier ;
Black, Michael J. .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :561-578
[8]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[9]   Personalizing Human Video Pose Estimation [J].
Charles, James ;
Pfister, Tomas ;
Magee, Derek ;
Hogg, David ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3063-3072
[10]   3D Human Pose Estimation=2D Pose Estimation plus Matching [J].
Chen, Ching-Hang ;
Ramanan, Deva .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5759-5767