Ssman: self-supervised masked adaptive network for 3D human pose estimation

被引:1
作者
Shi, Yu [1 ]
Yue, Tianyi [1 ]
Zhao, Hu [1 ]
He, Guoping [1 ]
Ren, Keyan [1 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, 100 Pingleyuan, Beijing 100124, Peoples R China
关键词
Deep learning; Human pose estimation; Adaption ability; Self-supervised learning;
D O I
10.1007/s00138-024-01514-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The modern deep learning-based models for 3D human pose estimation from monocular images always lack the adaption ability between occlusion and non-occlusion scenarios, which might restrict the performance of current methods when faced with various scales of occluded conditions. In an attempt to tackle this problem, we propose a novel network called self-supervised masked adaptive network (SSMAN). Firstly, we leverage different levels of masks to cover the richness of occlusion in fully in-the-wild environment. Then, we design a multi-line adaptive network, which could be trained with various scales of masked images in parallel. Based on this masked adaptive network, we train it with self-supervised learning to enforce the consistency across the outputs under different mask ratios. Furthermore, a global refinement module is proposed to leverage global features of the human body to refine the human pose estimated solely by local features. We perform extensive experiments both on the occlusion datasets like 3DPW-OCC and OCHuman and general datasets such as Human3.6M and 3DPW. The results show that SSMAN achieves new state-of-the-art performance on both lightly and heavily occluded benchmarks and is highly competitive with significant improvement on standard benchmarks.
引用
收藏
页数:14
相关论文
共 54 条
  • [1] 2D Human Pose Estimation: New Benchmark and State of the Art Analysis
    Andriluka, Mykhaylo
    Pishchulin, Leonid
    Gehler, Peter
    Schiele, Bernt
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3686 - 3693
  • [2] Bao H., 2021, arXiv
  • [3] Unsupervised 3D Pose Estimation with Geometric Self-Supervision
    Chen, Ching-Hang
    Tyagi, Ambrish
    Agrawal, Amit
    Drover, Dylan
    Rohith, M., V
    Stojanov, Stefan
    Rehg, James M.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5707 - 5717
  • [4] SportsCap: Monocular 3D Human Motion Capture and Fine-Grained Understanding in Challenging Sports Videos
    Chen, Xin
    Pang, Anqi
    Yang, Wei
    Ma, Yuexin
    Xu, Lan
    Yu, Jingyi
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (10) : 2846 - 2864
  • [5] Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation
    Chen, Xipeng
    Lin, Kwan-Yee
    Liu, Wentao
    Qian, Chen
    Lin, Liang
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10887 - 10896
  • [6] Cheng Y, 2020, AAAI CONF ARTIF INTE, V34, P10631
  • [7] Occlusion-Aware Networks for 3D Human Pose Estimation in Video
    Cheng, Yu
    Yang, Bo
    Wang, Bo
    Yan, Wending
    Tan, Robby T.
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 723 - 732
  • [8] Cho KYHY, 2014, Arxiv, DOI arXiv:1406.1078
  • [9] Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes
    Choi, Hongsuk
    Moon, Gyeongsik
    Park, JoonKyu
    Lee, Kyoung Mu
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1465 - 1474
  • [10] Unsupervised Visual Representation Learning by Context Prediction
    Doersch, Carl
    Gupta, Abhinav
    Efros, Alexei A.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1422 - 1430