Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation

被引:11
|
作者
Honari, Sina [1 ]
Constantin, Victor [1 ]
Rhodin, Helge [2 ]
Salzmann, Mathieu [1 ]
Fua, Pascal [1 ]
机构
[1] CVLab, EPFL, Lausanne, Switzerland
[2] Imager Lab, UBC, Vancouver, BC, Canada
关键词
Crops; Feature extraction; TV; Cameras; Three-dimensional displays; Image reconstruction; Unsupervised learning; Temporal feature extraction; unsupervised representation learning; contrastive learning; 3D human pose;
D O I
10.1109/TPAMI.2022.3215307
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article we propose an unsupervised feature extraction method to capture temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying contrastive loss only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features, well-suited for human pose estimation. Our approach reduces error by about 50% compared to the standard CSS strategies, outperforms other unsupervised single-view methods and matches the performance of multi-view techniques. When 2D pose is available, our approach can extract even richer latent features and improve the 3D pose estimation accuracy, outperforming other state-of-the-art weakly supervised methods.
引用
收藏
页码:6415 / 6427
页数:13
相关论文
共 50 条
  • [1] On the Effect of Temporal Information on Monocular 3D Human Pose Estimation
    Brauer, Juergen
    Gong, Wenjuan
    Gonzalez, Jordi
    Arens, Michael
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
  • [2] LEARNING MONOCULAR 3D HUMAN POSE ESTIMATION WITH SKELETAL INTERPOLATION
    Chen, Ziyi
    Sugimoto, Akihiro
    Lai, Shang-Hong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4218 - 4222
  • [3] A survey on monocular 3D human pose estimation
    Ji X.
    Fang Q.
    Dong J.
    Shuai Q.
    Jiang W.
    Zhou X.
    Virtual Reality and Intelligent Hardware, 2020, 2 (06): : 471 - 500
  • [4] Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos
    Cheng, Yu
    Wang, Bo
    Yang, Bo
    Tan, Robby T.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1157 - 1165
  • [5] MONOCULAR 3D HUMAN POSE ESTIMATION BY CLASSIFICATION
    Greif, Thomas
    Lienhart, Rainer
    Sengupta, Debabrata
    2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,
  • [6] On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos
    Li, Zhi
    Wang, Xuan
    Wang, Fei
    Jiang, Peilin
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2192 - 2201
  • [7] SPATIO-TEMPORAL ATTENTION GRAPH FOR MONOCULAR 3D HUMAN POSE ESTIMATION
    Zhang, Lijun
    Shao, Xiaohu
    Li, Zhenghao
    Zhou, Xiang-Dong
    Shi, Yu
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1231 - 1235
  • [8] Adapted human pose: monocular 3D human pose estimation with zero real 3D pose data
    Liu, Shuangjun
    Sehgal, Naveen
    Ostadabbas, Sarah
    APPLIED INTELLIGENCE, 2022, 52 (12) : 14491 - 14506
  • [9] Learning to Augment Poses for 3D Human Pose Estimation in Images and Videos
    Zhang, Jianfeng
    Gong, Kehong
    Wang, Xinchao
    Feng, Jiashi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 10012 - 10026
  • [10] Adapted human pose: monocular 3D human pose estimation with zero real 3D pose data
    Shuangjun Liu
    Naveen Sehgal
    Sarah Ostadabbas
    Applied Intelligence, 2022, 52 : 14491 - 14506