3D Human Pose and Shape Reconstruction From Videos via Confidence-Aware Temporal Feature Aggregation

被引:5
作者
Zhang, Hongrun [1 ]
Meng, Yanda [1 ]
Zhao, Yitian [2 ]
Qian, Xuesheng [3 ]
Qiao, Yihong [3 ]
Yang, Xiaoyun [4 ]
Zheng, Yalin [1 ]
机构
[1] Univ Liverpool, Inst Life Course & Med Sci, Liverpool L7 8TX, Merseyside, England
[2] Chinese Acad Sci, Ningbo Inst Mat Technol & Engn, Cixi Inst Biomed Engn, Ningbo 315201, Peoples R China
[3] China IntelliCloud Co, Shanghai, Peoples R China
[4] Remark AI UK Ltd, London SE1 9PD, England
关键词
Three-dimensional displays; Feature extraction; Shape; Training; Correlation; Solid modeling; Videos; Human pose; temporal estimation; uncertainty;
D O I
10.1109/TMM.2022.3167887
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating 3D human body shapes and poses from videos is a challenging computer vision task. The intrinsic temporal information embedded in adjacent frames is helpful in making accurate estimations. Existing approaches learn temporal features of the target frames simply by aggregating features of their adjacent frames, using off-the-shelf deep neural networks. Consequently these approaches cannot explicitly and effectively use the correlations between adjacent frames to help infer the parameters of the target frames. In this paper, we propose a novel framework that can measure the correlations amongst adjacent frames in the form of an estimated confidence metric. The confidence value will indicate to what extent the adjacent frames can help predict the target frames' 3D shapes and poses. Based on the estimated confidence values, temporally aggregated features are then obtained by adaptively allocating different weights to the temporal predicted features from the adjacent frames. The final 3D shapes and poses are estimated by regressing from the temporally aggregated features. Experimental results on three benchmark datasets show that the proposed method outperforms state-ofthe-art approaches (even without the motion priors involved in training). In particular, the proposed method is more robust against corrupted frames.
引用
收藏
页码:3868 / 3880
页数:13
相关论文
共 60 条
  • [21] Towards Accurate Marker-less Human Shape and Pose Estimation over Time
    Huang, Yinghao
    Bogo, Federica
    Lassner, Christoph
    Kanazawa, Angjoo
    Gehler, Peter, V
    Romero, Javier
    Akhter, Ijaz
    Black, Michael J.
    [J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, : 421 - 430
  • [22] Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments
    Ionescu, Catalin
    Papava, Dragos
    Olaru, Vlad
    Sminchisescu, Cristian
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (07) : 1325 - 1339
  • [23] Johnson S, 2010, BMVC, DOI [10.5244/C.24.12, DOI 10.5244/C.24.12]
  • [24] Hybrid Refinement-Correction Heatmaps for Human Pose Estimation
    Kamel, Aouaidjia
    Sheng, Bin
    Li, Ping
    Kim, Jinman
    Feng, David Dagan
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1330 - 1342
  • [25] Learning 3D Human Dynamics from Video
    Kanazawa, Angjoo
    Zhang, Jason Y.
    Felsen, Panna
    Malik, Jitendra
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5597 - 5606
  • [26] End-to-end Recovery of Human Shape and Pose
    Kanazawa, Angjoo
    Black, Michael J.
    Jacobs, David W.
    Malik, Jitendra
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7122 - 7131
  • [27] Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
    Kendall, Alex
    Gal, Yarin
    Cipolla, Roberto
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7482 - 7491
  • [28] Kendall Alex, 2017, ADV NEURAL INFORM PR, V30
  • [29] VIBE: Video Inference for Human Body Pose and Shape Estimation
    Kocabas, Muhammed
    Athanasiou, Nikos
    Black, Michael J.
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5252 - 5262
  • [30] Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop
    Kolotouros, Nikos
    Pavlakos, Georgios
    Black, Michael J.
    Daniilidis, Kostas
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2252 - 2261