HDPose: Post-Hierarchical Diffusion with Conditioning for 3D Human Pose Estimation

被引:1
作者
Lee, Donghoon [1 ]
Kim, Jaeho [2 ]
机构
[1] Sejong Univ, Dept Informat & Commun Engn, Seoul 05006, South Korea
[2] Sejong Univ, Dept Elect Engn, Seoul 05006, South Korea
关键词
3D human pose estimation; diffusion; transformer; hierarchical structure;
D O I
10.3390/s24030829
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Recently, monocular 3D human pose estimation (HPE) methods were used to accurately predict 3D pose by solving the ill-pose problem caused by 3D-2D projection. However, monocular 3D HPE still remains challenging owing to the inherent depth ambiguity and occlusions. To address this issue, previous studies have proposed diffusion model-based approaches (DDPM) that learn to reconstruct a correct 3D pose from a noisy initial 3D pose. In addition, these approaches use 2D keypoints or context encoders that encode spatial and temporal information to inform the model. However, they often fall short of achieving peak performance, or require an extended period to converge to the target pose. In this paper, we proposed HDPose, which can converge rapidly and predict 3D poses accurately. Our approach aggregated spatial and temporal information from the condition into a denoising model in a hierarchical structure. We observed that the post-hierarchical structure achieved the best performance among various condition structures. Further, we evaluated our model on the widely used Human3.6M and MPI-INF-3DHP datasets. The proposed model demonstrated competitive performance with state-of-the-art models, achieving high accuracy with faster convergence while being considerably more lightweight.
引用
收藏
页数:17
相关论文
共 54 条
[1]  
Barsoum E, 2017, Arxiv, DOI [arXiv:1711.09561, DOI 10.48550/ARXIV.1711.09561, 10.48550/ARXIV.1711.09561]
[2]   Exploiting Spatial-temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks [J].
Cai, Yujun ;
Ge, Liuhao ;
Liu, Jun ;
Cai, Jianfei ;
Cham, Tat-Jen ;
Yuan, Junsong ;
Thalmann, Nadia Magnenat .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2272-2281
[3]   Anatomy-Aware 3D Human Pose Estimation With Bone-Based Pose Decomposition [J].
Chen, Tianlang ;
Fang, Chen ;
Shen, Xiaohui ;
Zhu, Yiheng ;
Chen, Zhili ;
Luo, Jiebo .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) :198-209
[4]  
Chen YJ, 2021, Arxiv, DOI [arXiv:2006.15561, 10.1109/TIP.2021.3068645, DOI 10.1109/TIP.2021.3068645]
[5]   3D Human Pose Estimation=2D Pose Estimation plus Matching [J].
Chen, Ching-Hang ;
Ramanan, Deva .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5759-5767
[6]  
Choi J, 2022, Arxiv, DOI arXiv:2212.02796
[7]   Learning 3D Human Pose from Structure and Motion [J].
Dabral, Rishabh ;
Mundhada, Anurag ;
Kusupati, Uday ;
Afaque, Safeer ;
Sharma, Abhishek ;
Jain, Arjun .
COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :679-696
[8]  
Ehlers K, 2016, IEEE INT C EMERG
[9]  
Goodfellow I, 2014, arXiv
[10]   In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations [J].
Habibie, Ikhsanul ;
Xu, Weipeng ;
Mehta, Dushyant ;
Pons-Moll, Gerard ;
Theobalt, Christian .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10897-10906