3D Human Pose and Shape Reconstruction From Videos via Confidence-Aware Temporal Feature Aggregation

被引：5

作者：

Zhang, Hongrun ^{[1
]}

Meng, Yanda ^{[1
]}

Zhao, Yitian ^{[2
]}

Qian, Xuesheng ^{[3
]}

Qiao, Yihong ^{[3
]}

Yang, Xiaoyun ^{[4
]}

Zheng, Yalin ^{[1
]}

机构：

[1] Univ Liverpool, Inst Life Course & Med Sci, Liverpool L7 8TX, Merseyside, England

[2] Chinese Acad Sci, Ningbo Inst Mat Technol & Engn, Cixi Inst Biomed Engn, Ningbo 315201, Peoples R China

[3] China IntelliCloud Co, Shanghai, Peoples R China

[4] Remark AI UK Ltd, London SE1 9PD, England

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

关键词：

Three-dimensional displays; Feature extraction; Shape; Training; Correlation; Solid modeling; Videos; Human pose; temporal estimation; uncertainty;

D O I：

10.1109/TMM.2022.3167887

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Estimating 3D human body shapes and poses from videos is a challenging computer vision task. The intrinsic temporal information embedded in adjacent frames is helpful in making accurate estimations. Existing approaches learn temporal features of the target frames simply by aggregating features of their adjacent frames, using off-the-shelf deep neural networks. Consequently these approaches cannot explicitly and effectively use the correlations between adjacent frames to help infer the parameters of the target frames. In this paper, we propose a novel framework that can measure the correlations amongst adjacent frames in the form of an estimated confidence metric. The confidence value will indicate to what extent the adjacent frames can help predict the target frames' 3D shapes and poses. Based on the estimated confidence values, temporally aggregated features are then obtained by adaptively allocating different weights to the temporal predicted features from the adjacent frames. The final 3D shapes and poses are estimated by regressing from the temporally aggregated features. Experimental results on three benchmark datasets show that the proposed method outperforms state-ofthe-art approaches (even without the motion priors involved in training). In particular, the proposed method is more robust against corrupted frames.

引用

页码：3868 / 3880

页数：13

共 60 条

[1] Recovering 3D human pose from monocular images
Agarwal, A
Triggs, B
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (01) : 44 - 58
[2] Akhter I, 2015, PROC CVPR IEEE, P1446, DOI 10.1109/CVPR.2015.7298751
[3] PoseTrack: A Benchmark for Human Pose Estimation and Tracking
Andriluka, Mykhaylo
Iqbal, Umar
Insafutdinov, Eldar
Pishchulin, Leonid
Milan, Anton
Gall, Juergen
Schiele, Bernt
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5167 - 5176
[4] Exploiting temporal context for 3D human pose estimation in the wild
Arnab, Anurag
Doersch, Carl
Zisserman, Andrew
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3390 - 3399
[5] Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image
Bogo, Federica
Kanazawa, Angjoo
Lassner, Christoph
Gehler, Peter
Romero, Javier
Black, Michael J.
[J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 561 - 578
[6] Deep representation learning for human motion prediction and classification
Butepage, Judith
Black, Michael J.
Kragic, Danica
Kjellstrom, Hedvig
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1591 - 1599
[7] Forecasting Human Dynamics from Static Images
Chao, Yu-Wei
Yang, Jimei
Price, Brian
Cohen, Scott
Deng, Jia
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3643 - 3651
[8] Cho K., 2014, P 2014 C EMP METH NA, DOI 10.3115/v1/d14-1179
[9] Long Short-Term Memory Kalman Filters: Recurrent Neural Estimators for Pose Regularization
Coskun, Huseyin
Achilles, Felix
DiPietro, Robert
Navab, Nassir
Tombari, Federico
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5525 - 5533
[10] Learning 3D Human Pose from Structure and Motion
Dabral, Rishabh
Mundhada, Anurag
Kusupati, Uday
Afaque, Safeer
Sharma, Abhishek
Jain, Arjun
[J]. COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 679 - 696

← 1 2 3 4 5 6 →