Ssman: self-supervised masked adaptive network for 3D human pose estimation

被引：1

作者：

Shi, Yu ^{[1
]}

Yue, Tianyi ^{[1
]}

Zhao, Hu ^{[1
]}

He, Guoping ^{[1
]}

Ren, Keyan ^{[1
]}

机构：

[1] Beijing Univ Technol, Fac Informat Technol, 100 Pingleyuan, Beijing 100124, Peoples R China

来源：

MACHINE VISION AND APPLICATIONS | 2024年 / 35卷 / 03期

关键词：

Deep learning; Human pose estimation; Adaption ability; Self-supervised learning;

D O I：

10.1007/s00138-024-01514-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The modern deep learning-based models for 3D human pose estimation from monocular images always lack the adaption ability between occlusion and non-occlusion scenarios, which might restrict the performance of current methods when faced with various scales of occluded conditions. In an attempt to tackle this problem, we propose a novel network called self-supervised masked adaptive network (SSMAN). Firstly, we leverage different levels of masks to cover the richness of occlusion in fully in-the-wild environment. Then, we design a multi-line adaptive network, which could be trained with various scales of masked images in parallel. Based on this masked adaptive network, we train it with self-supervised learning to enforce the consistency across the outputs under different mask ratios. Furthermore, a global refinement module is proposed to leverage global features of the human body to refine the human pose estimated solely by local features. We perform extensive experiments both on the occlusion datasets like 3DPW-OCC and OCHuman and general datasets such as Human3.6M and 3DPW. The results show that SSMAN achieves new state-of-the-art performance on both lightly and heavily occluded benchmarks and is highly competitive with significant improvement on standard benchmarks.

引用

页数：14

共 54 条

[1] 2D Human Pose Estimation: New Benchmark and State of the Art Analysis
Andriluka, Mykhaylo
Pishchulin, Leonid
Gehler, Peter
Schiele, Bernt
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3686 - 3693
[2] Bao H., 2021, arXiv
[3] Unsupervised 3D Pose Estimation with Geometric Self-Supervision
Chen, Ching-Hang
Tyagi, Ambrish
Agrawal, Amit
Drover, Dylan
Rohith, M., V
Stojanov, Stefan
Rehg, James M.
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5707 - 5717
[4] SportsCap: Monocular 3D Human Motion Capture and Fine-Grained Understanding in Challenging Sports Videos
Chen, Xin
Pang, Anqi
Yang, Wei
Ma, Yuexin
Xu, Lan
Yu, Jingyi
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (10) : 2846 - 2864
[5] Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation
Chen, Xipeng
Lin, Kwan-Yee
Liu, Wentao
Qian, Chen
Lin, Liang
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10887 - 10896
[6] Cheng Y, 2020, AAAI CONF ARTIF INTE, V34, P10631
[7] Occlusion-Aware Networks for 3D Human Pose Estimation in Video
Cheng, Yu
Yang, Bo
Wang, Bo
Yan, Wending
Tan, Robby T.
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 723 - 732
[8] Cho KYHY, 2014, Arxiv, DOI arXiv:1406.1078
[9] Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes
Choi, Hongsuk
Moon, Gyeongsik
Park, JoonKyu
Lee, Kyoung Mu
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1465 - 1474
[10] Unsupervised Visual Representation Learning by Context Prediction
Doersch, Carl
Gupta, Abhinav
Efros, Alexei A.
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1422 - 1430

← 1 2 3 4 5 6 →