3D Human Pose Machines with Self-Supervised Learning

被引:50
|
作者
Wang, Keze [1 ,2 ]
Lin, Liang [1 ]
Jiang, Chenhan [1 ]
Qian, Chen [3 ]
Wei, Pengxu [1 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Peoples R China
[2] Minist Educ, Engn Res Ctr Adv Comp Engn Software, Beijing, Peoples R China
[3] SenseTime Grp, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Two dimensional displays; Pose estimation; Solid modeling; Task analysis; Deep learning; Feature extraction; Human pose estimation; convolutional neural networks; spatio-temporal modeling; self-supervised learning; geometric deep learning;
D O I
10.1109/TPAMI.2019.2892452
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Driven by recent computer vision and robotic applications, recovering 3D human poses has become increasingly important and attracted growing interests. In fact, completing this task is quite challenging due to the diverse appearances, viewpoints, occlusions and inherently geometric ambiguities inside monocular images. Most of the existing methods focus on designing some elaborate priors /constraints to directly regress 3D human poses based on the corresponding 2D human pose-aware features or 2D pose predictions. However, due to the insufficient 3D pose data for training and the domain gap between 2D space and 3D space, these methods have limited scalabilities for all practical scenarios (e.g., outdoor scene). Attempt to address this issue, this paper proposes a simple yet effective self-supervised correction mechanism to learn all intrinsic structures of human poses from abundant images. Specifically, the proposed mechanism involves two dual learning tasks, i.e., the 2D-to-3D pose transformation and 3D-to-2D pose projection, to serve as a bridge between 3D and 2D human poses in a type of "free" self-supervision for accurate 3D human pose estimation. The 2D-to-3D pose implies to sequentially regress intermediate 3D poses by transforming the pose representation from the 2D domain to the 3D domain under the sequence-dependent temporal context, while the 3D-to-2D pose projection contributes to refining the intermediate 3D poses by maintaining geometric consistency between the 2D projections of 3D poses and the estimated 2D poses. Therefore, these two dual learning tasks enable our model to adaptively learn from 3D human pose data and external large-scale 2D human pose data. We further apply our self-supervised correction mechanism to develop a 3D human pose machine, which jointly integrates the 2D spatial relationship, temporal smoothness of predictions and 3D geometric knowledge. Extensive evaluations on the Human3.6M and HumanEva-I benchmarks demonstrate the superior performance and efficiency of our framework over all the compared competing methods.
引用
收藏
页码:1069 / 1082
页数:14
相关论文
共 50 条
  • [1] Multi-View 3D Human Pose Estimation with Self-Supervised Learning
    Chang, Inho
    Park, Min-Gyu
    Kim, Jaewoo
    Yoon, Ju Hong
    3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021), 2021, : 255 - 257
  • [2] Self-supervised 3D human pose estimation from video
    Gholami, Mohsen
    Rezaei, Ahmad
    Rhodin, Helge
    Ward, Rabab
    Wang, Z. Jane
    NEUROCOMPUTING, 2022, 488 : 97 - 106
  • [3] Self-Supervised Learning of 3D Human Pose using Multi-view Geometry
    Kocabas, Muhammed
    Karagoz, Salih
    Akbas, Emre
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1077 - 1086
  • [4] Rotated Orthographic Projection for Self-supervised 3D Human Pose Estimation
    Yao, Yao
    Pan, Yixuan
    Shi, Wenjun
    Zhu, Dongchen
    Wang, Lei
    Li, Jiamao
    COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 422 - 439
  • [5] CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild
    Wandt, Bastian
    Rudolph, Marco
    Zell, Petrissa
    Rhodin, Helge
    Rosenhahn, Bodo
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13289 - 13299
  • [6] MAPConNet: Self-supervised 3D Pose Transfer with Mesh and Point Contrastive Learning
    Sun, Jiaze
    Chen, Zhixiang
    Kim, Tae-Kyun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14406 - 14416
  • [7] Ssman: self-supervised masked adaptive network for 3D human pose estimation
    Yu Shi
    Tianyi Yue
    Hu Zhao
    Guoping He
    Keyan Ren
    Machine Vision and Applications, 2024, 35
  • [8] Ssman: self-supervised masked adaptive network for 3D human pose estimation
    Shi, Yu
    Yue, Tianyi
    Zhao, Hu
    He, Guoping
    Ren, Keyan
    MACHINE VISION AND APPLICATIONS, 2024, 35 (03)
  • [9] Self-Supervised 3D Human Pose Estimation with Multiple-View Geometry
    Bouazizi, Arij
    Wiederer, Julian
    Kressel, Ulrich
    Belagiannis, Vasileios
    2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [10] Geometry-Driven Self-Supervised Method for 3D Human Pose Estimation
    Li, Yang
    Li, kan
    Jiang, Shuai
    Zhang, Ziyue
    Huang, Congzhentao
    Xu, Richard Yi Da
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11442 - 11449