Exploiting Temporal Information for 3D Human Pose Estimation

被引：255

作者：

Hossain, Mir Rayat Imtiaz ^{[1
]}

Little, James J. ^{[1
]}

机构：

[1] Univ British Columbia, Dept Comp Sci, Vancouver, BC, Canada

来源：

COMPUTER VISION - ECCV 2018, PT X | 2018年 / 11214卷

关键词：

3D human pose; Sequence-to-sequence networks; Layer normalized LSTM; Residual connections;

D O I：

10.1007/978-3-030-01249-6_5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we address the problem of 3D human pose estimation from a sequence of 2D human poses. Although the recent success of deep networks has led many state-of-the-art methods for 3D pose estimation to train deep networks end-to-end to predict from images directly, the top-performing approaches have shown the effectiveness of dividing the task of 3D pose estimation into two steps: using a state-of-the-art 2D pose estimator to estimate the 2D pose from images and then mapping them into 3D space. They also showed that a low-dimensional representation like 2D locations of a set of joints can be discriminative enough to estimate 3D pose with high accuracy. However, estimation of 3D pose for individual frames leads to temporally incoherent estimates due to independent error in each frame causing jitter. Therefore, in this work we utilize the temporal information across a sequence of 2D joint locations to estimate a sequence of 3D poses. We designed a sequence-to-sequence network composed of layer-normalized LSTM units with shortcut connections connecting the input to the output on the decoder side and imposed temporal smoothness constraint during training. We found that the knowledge of temporal consistency improves the best reported result on Human3.6M dataset by approximately 12.2% and helps our network to recover temporally consistent 3D poses over a sequence of images even when the 2D pose detector fails.

引用

页码：69 / 86

页数：18

共 53 条

[1]

Agarwal A., 2004, P IEEE COMP SOC C CO, DOI [10.1109/CVPR.2004.1315258, DOI 10.1109/CVPR.2004.1315258]

[2]

Akhter I, 2015, PROC CVPR IEEE, P1446, DOI 10.1109/CVPR.2015.7298751

[3] 2D Human Pose Estimation: New Benchmark and State of the Art Analysis [J].

Andriluka, Mykhaylo ;

Pishchulin, Leonid ;

Gehler, Peter ;

Schiele, Bernt .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3686-3693

[4]

[Anonymous], ARXIV161109813V2

[5]

[Anonymous], 2010, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), DOI DOI 10.1109/CVPR.2010.5540156

[6]

Ba JL, 2016, LAYER NORMALIZATION

[7] Estimating anthropometry and pose from a single uncalibrated image [J].

Barrón, C ;

Kakadiaris, IA .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2001, 81 (03) :269-284

[8]

Bo LF, 2008, PROC CVPR IEEE, P1833

[9] Twin Gaussian Processes for Structured Prediction [J].

Bo, Liefeng ;

Sminchisescu, Cristian .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 87 (1-2) :28-52

[10] Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image [J].

Bogo, Federica ;

Kanazawa, Angjoo ;

Lassner, Christoph ;

Gehler, Peter ;

Romero, Javier ;

Black, Michael J. .

COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :561-578

← 1 2 3 4 5 6 →