SCALE-Pose: Skeletal Correction and Language Knowledge-assisted for 3D Human Pose Estimation

被引:0
作者
Ma, Xinnan [1 ]
Li, Yaochen [1 ]
Zhao, Limeng [1 ]
Zhou, ChenXu [1 ]
Xu, Yuncheng [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI | 2025年 / 15041卷
关键词
3D human pose estimation; Transformer; Priori knowledge; Skeletal correction; Large language model;
D O I
10.1007/978-981-97-8795-1_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based 3D human pose estimation methods typically use 2D joint sequences as inputs, leveraging spatial and temporal transformer encoders to model the 3D human pose. However, these methods often neglect to incorporate skeletal constraints to limit joint motion, and few consider integrating prior category knowledge to enhance potential joint representations. To solve these problems, we propose a new method named SCALE-Pose. Firstly, this method incorporates the spatial and temporal skeleton correction blocks to improve the ability of modeling the long-range dependency of the spatiotemporal motion of specific skeletons. Next, a four-stream radian loss based on skeleton angle error is introduced to constrain the motion space of joints. Finally, an auxiliary method employs global-local prompts from a large language model to generate prior category knowledge, improving the ability to generalize prior category knowledge. Experimental results on Human3.6M and MPI-INF-3DHP datasets demonstrate that our method outperforms existing approaches.
引用
收藏
页码:578 / 592
页数:15
相关论文
共 50 条
  • [31] Hierarchical Spatial-Temporal Adaptive Graph Fusion for Monocular 3D Human Pose Estimation
    Zhang, Lijun
    Lu, Feng
    Zhou, Kangkang
    Zhou, Xiang-Dong
    Shi, Yu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 61 - 65
  • [32] Exploiting Static and Dynamic Human Joint Relations for 3D Pose Estimation via Cascade Transformers
    Song, Bo
    Ji, Changjiang
    Fan, Shuo
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4522 - 4528
  • [33] Multi-hypothesis representation learning for transformer-based 3D human pose estimation
    Li, Wenhao
    Liu, Hong
    Tang, Hao
    Wang, Pichao
    PATTERN RECOGNITION, 2023, 141
  • [34] Learning the Dynamic Spatio-Temporal Relationship Between Joints for 3D Human Pose Estimation
    Xu, Feiyi
    Sun, Ying
    Qi, Jin
    Sun, Yanfei
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VI, 2025, 15036 : 269 - 284
  • [35] HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation
    Cheng, Wencan
    Kim, Eunji
    Ko, Jong Hwan
    COMPUTER VISION - ECCV 2024, PT LXXXVIII, 2025, 15146 : 35 - 52
  • [36] Corn pose estimation using 3D object detection and stereo images
    Gao, Yuliang
    Li, Zhen
    Hong, Qingqing
    Li, Bin
    Zhang, Lifeng
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2025, 231
  • [37] A Geometric Knowledge Oriented Single-Frame 2D-to-3D Human Absolute Pose Estimation Method
    Hu, Mengxian
    Liu, Chengju
    Li, Shu
    Yan, Qingqing
    Fang, Qin
    Chen, Qijun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7282 - 7295
  • [38] DBMHT: A double-branch multi-hypothesis transformer for 3D human pose estimation in video
    Xiang, Xuezhi
    Li, Xiaoheng
    Bao, Weijie
    Qiaoa, Yulong
    El Saddik, Abdulmotaleb
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [39] Hierarchical Local Temporal Network for 2D-to-3D Human Pose Estimation
    Yan, Xin
    Xie, Jiucheng
    Liu, Mengqi
    Li, Haolun
    Gao, Hao
    IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (01): : 869 - 880
  • [40] 3D interacting hand pose and shape estimation from a single RGB image
    Gao, Chengying
    Yang, Yujia
    Li, Wensheng
    NEUROCOMPUTING, 2022, 474 : 25 - 36