3D Human Pose and Shape Reconstruction From Videos via Confidence-Aware Temporal Feature Aggregation

被引:5
作者
Zhang, Hongrun [1 ]
Meng, Yanda [1 ]
Zhao, Yitian [2 ]
Qian, Xuesheng [3 ]
Qiao, Yihong [3 ]
Yang, Xiaoyun [4 ]
Zheng, Yalin [1 ]
机构
[1] Univ Liverpool, Inst Life Course & Med Sci, Liverpool L7 8TX, Merseyside, England
[2] Chinese Acad Sci, Ningbo Inst Mat Technol & Engn, Cixi Inst Biomed Engn, Ningbo 315201, Peoples R China
[3] China IntelliCloud Co, Shanghai, Peoples R China
[4] Remark AI UK Ltd, London SE1 9PD, England
关键词
Three-dimensional displays; Feature extraction; Shape; Training; Correlation; Solid modeling; Videos; Human pose; temporal estimation; uncertainty;
D O I
10.1109/TMM.2022.3167887
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating 3D human body shapes and poses from videos is a challenging computer vision task. The intrinsic temporal information embedded in adjacent frames is helpful in making accurate estimations. Existing approaches learn temporal features of the target frames simply by aggregating features of their adjacent frames, using off-the-shelf deep neural networks. Consequently these approaches cannot explicitly and effectively use the correlations between adjacent frames to help infer the parameters of the target frames. In this paper, we propose a novel framework that can measure the correlations amongst adjacent frames in the form of an estimated confidence metric. The confidence value will indicate to what extent the adjacent frames can help predict the target frames' 3D shapes and poses. Based on the estimated confidence values, temporally aggregated features are then obtained by adaptively allocating different weights to the temporal predicted features from the adjacent frames. The final 3D shapes and poses are estimated by regressing from the temporally aggregated features. Experimental results on three benchmark datasets show that the proposed method outperforms state-ofthe-art approaches (even without the motion priors involved in training). In particular, the proposed method is more robust against corrupted frames.
引用
收藏
页码:3868 / 3880
页数:13
相关论文
共 60 条
  • [1] Recovering 3D human pose from monocular images
    Agarwal, A
    Triggs, B
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (01) : 44 - 58
  • [2] Akhter I, 2015, PROC CVPR IEEE, P1446, DOI 10.1109/CVPR.2015.7298751
  • [3] PoseTrack: A Benchmark for Human Pose Estimation and Tracking
    Andriluka, Mykhaylo
    Iqbal, Umar
    Insafutdinov, Eldar
    Pishchulin, Leonid
    Milan, Anton
    Gall, Juergen
    Schiele, Bernt
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5167 - 5176
  • [4] Exploiting temporal context for 3D human pose estimation in the wild
    Arnab, Anurag
    Doersch, Carl
    Zisserman, Andrew
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3390 - 3399
  • [5] Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image
    Bogo, Federica
    Kanazawa, Angjoo
    Lassner, Christoph
    Gehler, Peter
    Romero, Javier
    Black, Michael J.
    [J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 561 - 578
  • [6] Deep representation learning for human motion prediction and classification
    Butepage, Judith
    Black, Michael J.
    Kragic, Danica
    Kjellstrom, Hedvig
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1591 - 1599
  • [7] Forecasting Human Dynamics from Static Images
    Chao, Yu-Wei
    Yang, Jimei
    Price, Brian
    Cohen, Scott
    Deng, Jia
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3643 - 3651
  • [8] Cho K., 2014, P 2014 C EMP METH NA, DOI 10.3115/v1/d14-1179
  • [9] Long Short-Term Memory Kalman Filters: Recurrent Neural Estimators for Pose Regularization
    Coskun, Huseyin
    Achilles, Felix
    DiPietro, Robert
    Navab, Nassir
    Tombari, Federico
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5525 - 5533
  • [10] Learning 3D Human Pose from Structure and Motion
    Dabral, Rishabh
    Mundhada, Anurag
    Kusupati, Uday
    Afaque, Safeer
    Sharma, Abhishek
    Jain, Arjun
    [J]. COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 679 - 696