SCALE-Pose: Skeletal Correction and Language Knowledge-assisted for 3D Human Pose Estimation

被引：0

作者：

Ma, Xinnan ^{[1
]}

Li, Yaochen ^{[1
]}

Zhao, Limeng ^{[1
]}

Zhou, ChenXu ^{[1
]}

Xu, Yuncheng ^{[1
]}

机构：

[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI | 2025年 / 15041卷

关键词：

3D human pose estimation; Transformer; Priori knowledge; Skeletal correction; Large language model;

D O I：

10.1007/978-981-97-8795-1_39

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer-based 3D human pose estimation methods typically use 2D joint sequences as inputs, leveraging spatial and temporal transformer encoders to model the 3D human pose. However, these methods often neglect to incorporate skeletal constraints to limit joint motion, and few consider integrating prior category knowledge to enhance potential joint representations. To solve these problems, we propose a new method named SCALE-Pose. Firstly, this method incorporates the spatial and temporal skeleton correction blocks to improve the ability of modeling the long-range dependency of the spatiotemporal motion of specific skeletons. Next, a four-stream radian loss based on skeleton angle error is introduced to constrain the motion space of joints. Finally, an auxiliary method employs global-local prompts from a large language model to generate prior category knowledge, improving the ability to generalize prior category knowledge. Experimental results on Human3.6M and MPI-INF-3DHP datasets demonstrate that our method outperforms existing approaches.

引用

页码：578 / 592

页数：15

共 50 条

[31] Hierarchical Spatial-Temporal Adaptive Graph Fusion for Monocular 3D Human Pose Estimation
Zhang, Lijun
Lu, Feng
Zhou, Kangkang
Zhou, Xiang-Dong
Shi, Yu
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 61 - 65
[32] Exploiting Static and Dynamic Human Joint Relations for 3D Pose Estimation via Cascade Transformers
Song, Bo
Ji, Changjiang
Fan, Shuo
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4522 - 4528
[33] Multi-hypothesis representation learning for transformer-based 3D human pose estimation
Li, Wenhao
Liu, Hong
Tang, Hao
Wang, Pichao
PATTERN RECOGNITION, 2023, 141
[34] Learning the Dynamic Spatio-Temporal Relationship Between Joints for 3D Human Pose Estimation
Xu, Feiyi
Sun, Ying
Qi, Jin
Sun, Yanfei
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VI, 2025, 15036 : 269 - 284
[35] HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation
Cheng, Wencan
Kim, Eunji
Ko, Jong Hwan
COMPUTER VISION - ECCV 2024, PT LXXXVIII, 2025, 15146 : 35 - 52
[36] Corn pose estimation using 3D object detection and stereo images
Gao, Yuliang
Li, Zhen
Hong, Qingqing
Li, Bin
Zhang, Lifeng
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2025, 231
[37] A Geometric Knowledge Oriented Single-Frame 2D-to-3D Human Absolute Pose Estimation Method
Hu, Mengxian
Liu, Chengju
Li, Shu
Yan, Qingqing
Fang, Qin
Chen, Qijun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7282 - 7295
[38] DBMHT: A double-branch multi-hypothesis transformer for 3D human pose estimation in video
Xiang, Xuezhi
Li, Xiaoheng
Bao, Weijie
Qiaoa, Yulong
El Saddik, Abdulmotaleb
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
[39] Hierarchical Local Temporal Network for 2D-to-3D Human Pose Estimation
Yan, Xin
Xie, Jiucheng
Liu, Mengqi
Li, Haolun
Gao, Hao
IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (01): : 869 - 880
[40] 3D interacting hand pose and shape estimation from a single RGB image
Gao, Chengying
Yang, Yujia
Li, Wensheng
NEUROCOMPUTING, 2022, 474 : 25 - 36

← 1 2 3 4 5 →