3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network

被引:306
作者
Li, Sijin [1 ]
Chan, Antoni B. [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China
来源
COMPUTER VISION - ACCV 2014, PT II | 2015年 / 9004卷
关键词
D O I
10.1007/978-3-319-16808-1_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a deep convolutional neural network for 3D human pose estimation from monocular images. We train the network using two strategies: (1) a multi-task framework that jointly trains pose regression and body part detectors; (2) a pre-training strategy where the pose regressor is initialized using a network trained for body part detection. We compare our network on a large data set and achieve significant improvement over baseline methods. Human pose estimation is a structured prediction problem, i.e., the locations of each body part are highly correlated. Although we do not add constraints about the correlations between body parts to the network, we empirically show that the network has disentangled the dependencies among different body parts, and learned their correlations.
引用
收藏
页码:332 / 347
页数:16
相关论文
共 33 条
[1]   Recovering 3D human pose from monocular images [J].
Agarwal, A ;
Triggs, B .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (01) :44-58
[2]   Monocular 3D Pose Estimation and Tracking by Detection [J].
Andriluka, Mykhaylo ;
Roth, Stefan ;
Schiele, Bernt .
2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :623-630
[3]  
Bengio Yoshua, 2013, Statistical Language and Speech Processing. First International Conference, SLSP 2013. Proceedings: LNCS 7978, P1, DOI 10.1007/978-3-642-39593-2_1
[4]   Twin Gaussian Processes for Structured Prediction [J].
Bo, Liefeng ;
Sminchisescu, Cristian .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 87 (1-2) :28-52
[5]   3D Pictorial Structures for Multiple View Articulated Pose Estimation [J].
Burenius, Magnus ;
Sullivan, Josephine ;
Carlsson, Stefan .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :3618-3625
[6]  
Dantone Matthias, 2013, CVPR
[7]   2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images [J].
Eichner, M. ;
Marin-Jimenez, M. ;
Zisserman, A. ;
Ferrari, V. .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2012, 99 (02) :190-214
[8]  
Erhan D, 2010, J MACH LEARN RES, V11, P625
[9]  
Farabet C., 2013, IEEE TPAMI, V32, P1744
[10]   Pictorial structures for object recognition [J].
Felzenszwalb, PF ;
Huttenlocher, DP .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2005, 61 (01) :55-79