Single-Stage is Enough: Multi-Person Absolute 3D Pose Estimation

被引:23
作者
Jin, Lei [1 ]
Xu, Chenyang [1 ]
Wang, Xiaojuan [1 ]
Xiao, Yabo [1 ]
Guo, Yandong [2 ]
Nie, Xuecheng [3 ]
Zhao, Jian [4 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
[2] OPPO Res Inst, Hyderabad, Telangana, India
[3] Natl Univ Singapore, Singapore, Singapore
[4] Inst North Elect Equipment, Bengaluru, Karnataka, India
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年
关键词
D O I
10.1109/CVPR52688.2022.01274
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The existing multi-person absolute 3D pose estimation methods are mainly based on two-stage paradigm, i.e., top-down or bottom-up, leading to redundant pipelines with high computation cost. We argue that it is more desirable to simplify such two-stage paradigm to a single-stage one to promote both efficiency and performance. To this end, we present an efficient single-stage solution, Decoupled Regression Model (DRM), with three distinct novelties. First, DRM introduces a new decoupled representation for 3D pose, which expresses the 2D pose in image plane and depth information of each 3D human instance via 2D center point (center of visible keypoints) and root point (denoted as pelvis), respectively. Second, to learn better feature representation for the human depth regression, DRM introduces a 2D Pose-guided Depth Query Module (PDQM) to extract the features in 2D pose regression branch, enabling the depth regression branch to perceive the scale information of instances. Third, DRM leverages a Decoupled Absolute Pose Loss (DAPL) to facilitate the absolute root depth and root-relative depth estimation, thus improving the accuracy of absolute 3D pose. Comprehensive experiments on challenging benchmarks including MuPoTS-3D and Panoptic clearly verify the superiority of our framework, which outperforms the state-of-the-art bottom-up absolute 3D pose estimation methods.
引用
收藏
页码:13076 / 13085
页数:10
相关论文
共 40 条
[1]  
[Anonymous], 2017, CVPR, DOI DOI 10.1109/CVPR.2017.138
[2]   PandaNet: Anchor-Based Single-Shot Multi-Person 3D Pose Estimation [J].
Benzine, Abdallah ;
Chabot, Florian ;
Luvison, Bertrand ;
Quoc Cuong Pham ;
Achard, Catherine .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6855-6864
[3]   Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks [J].
Cheng, Yu ;
Wang, Bo ;
Yang, Bo ;
Tan, Robby T. .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7645-7655
[4]  
Cheng Y, 2020, AAAI CONF ARTIF INTE, V34, P10631
[5]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6]  
Fang Hao-Shu, 2018, ECCV
[7]   Exploiting Temporal Information for 3D Human Pose Estimation [J].
Hossain, Mir Rayat Imtiaz ;
Little, James J. .
COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 :69-86
[8]  
Joo H., 2016, TPAMI, P1
[9]   End-to-end Recovery of Human Shape and Pose [J].
Kanazawa, Angjoo ;
Black, Michael J. ;
Jacobs, David W. ;
Malik, Jitendra .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7122-7131
[10]  
King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001