SCALE-Pose: Skeletal Correction and Language Knowledge-assisted for 3D Human Pose Estimation

被引:0
作者
Ma, Xinnan [1 ]
Li, Yaochen [1 ]
Zhao, Limeng [1 ]
Zhou, ChenXu [1 ]
Xu, Yuncheng [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI | 2025年 / 15041卷
关键词
3D human pose estimation; Transformer; Priori knowledge; Skeletal correction; Large language model;
D O I
10.1007/978-981-97-8795-1_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based 3D human pose estimation methods typically use 2D joint sequences as inputs, leveraging spatial and temporal transformer encoders to model the 3D human pose. However, these methods often neglect to incorporate skeletal constraints to limit joint motion, and few consider integrating prior category knowledge to enhance potential joint representations. To solve these problems, we propose a new method named SCALE-Pose. Firstly, this method incorporates the spatial and temporal skeleton correction blocks to improve the ability of modeling the long-range dependency of the spatiotemporal motion of specific skeletons. Next, a four-stream radian loss based on skeleton angle error is introduced to constrain the motion space of joints. Finally, an auxiliary method employs global-local prompts from a large language model to generate prior category knowledge, improving the ability to generalize prior category knowledge. Experimental results on Human3.6M and MPI-INF-3DHP datasets demonstrate that our method outperforms existing approaches.
引用
收藏
页码:578 / 592
页数:15
相关论文
共 23 条
[1]  
Chen HY, 2023, Arxiv, DOI arXiv:2302.01825
[2]   Cascaded Pyramid Network for Multi-Person Pose Estimation [J].
Chen, Yilun ;
Wang, Zhicheng ;
Peng, Yuxiang ;
Zhang, Zhiqiang ;
Yu, Gang ;
Sun, Jian .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7103-7112
[3]   Pre-Training With Whole Word Masking for Chinese BERT [J].
Cui, Yiming ;
Che, Wanxiang ;
Liu, Ting ;
Qin, Bing ;
Yang, Ziqing .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :3504-3514
[4]   PoseScript: 3D Human Poses from Natural Language [J].
Delmas, Ginger ;
Weinzaepfel, Philippe ;
Lucas, Thomas ;
Moreno-Noguer, Francesc ;
Rogez, Gregory .
COMPUTER VISION - ECCV 2022, PT VI, 2022, 13666 :346-362
[5]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[6]   Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting Transformers [J].
Einfalt, Moritz ;
Ludwig, Katja ;
Lienhart, Rainer .
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, :2902-2912
[7]  
Feng Y, 2024, Arxiv, DOI [arXiv:2311.18836, 10.48550/arXiv.2311.188368,9,11]
[8]   End-to-end Recovery of Human Shape and Pose [J].
Kanazawa, Angjoo ;
Black, Michael J. ;
Jacobs, David W. ;
Malik, Jitendra .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7122-7131
[9]   Exploiting Temporal Contexts With Strided Transformer for 3D Human Pose Estimation [J].
Li, Wenhao ;
Liu, Hong ;
Ding, Runwei ;
Liu, Mengyuan ;
Wang, Pichao ;
Yang, Wenming .
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :1282-1293
[10]   MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation [J].
Li, Wenhao ;
Liu, Hong ;
Tang, Hao ;
Wang, Pichao ;
Van Gool, Luc .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13137-13146