SCALE-Pose: Skeletal Correction and Language Knowledge-assisted for 3D Human Pose Estimation

被引:0
作者
Ma, Xinnan [1 ]
Li, Yaochen [1 ]
Zhao, Limeng [1 ]
Zhou, ChenXu [1 ]
Xu, Yuncheng [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI | 2025年 / 15041卷
关键词
3D human pose estimation; Transformer; Priori knowledge; Skeletal correction; Large language model;
D O I
10.1007/978-981-97-8795-1_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based 3D human pose estimation methods typically use 2D joint sequences as inputs, leveraging spatial and temporal transformer encoders to model the 3D human pose. However, these methods often neglect to incorporate skeletal constraints to limit joint motion, and few consider integrating prior category knowledge to enhance potential joint representations. To solve these problems, we propose a new method named SCALE-Pose. Firstly, this method incorporates the spatial and temporal skeleton correction blocks to improve the ability of modeling the long-range dependency of the spatiotemporal motion of specific skeletons. Next, a four-stream radian loss based on skeleton angle error is introduced to constrain the motion space of joints. Finally, an auxiliary method employs global-local prompts from a large language model to generate prior category knowledge, improving the ability to generalize prior category knowledge. Experimental results on Human3.6M and MPI-INF-3DHP datasets demonstrate that our method outperforms existing approaches.
引用
收藏
页码:578 / 592
页数:15
相关论文
共 50 条
  • [21] A Novel Auxiliary Task Framework in 3D Human Pose Estimation for Opera Videos
    Cai, Xingquan
    Zhang, Haoyu
    He, Shanshan
    Song, Haoyu
    Sun, Haiyan
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 202 - 210
  • [22] Bidirectional temporal feature for 3D human pose and shape estimation from a video
    Sun, Libo
    Tang, Ting
    Qu, Yuke
    Qin, Wenhu
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (3-4)
  • [23] Frame-Padded Multiscale Transformer for Monocular 3D Human Pose Estimation
    Zhong, Yuanhong
    Yang, Guangxia
    Zhong, Daidi
    Yang, Xun
    Wang, Shanshan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6191 - 6201
  • [24] STRFormer: Spatial-Temporal-ReTemporal Transformer for 3D human pose estimation
    Liu, Xing
    Tang, Hao
    IMAGE AND VISION COMPUTING, 2023, 140
  • [25] Global and Local Spatio-Temporal Encoder for 3D Human Pose Estimation
    Wang, Yong
    Kang, Hongbo
    Wu, Doudou
    Yang, Wenming
    Zhang, Longbin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4039 - 4049
  • [26] SCGFormer: Semantic Chebyshev Graph Convolution Transformer for 3D Human Pose Estimation
    Liang, Jiayao
    Yin, Mengxiao
    APPLIED SCIENCES-BASEL, 2024, 14 (04):
  • [27] Parallel-branch network for 3D human pose and shape estimation in video
    Wu, Yuanhao
    Wang, Chenxing
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
  • [28] HOGFormer: high-order graph convolution transformer for 3D human pose estimation
    Xie, Yuhong
    Hong, Chaoqun
    Zhuang, Weiwei
    Liu, Lijuan
    Li, Jie
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (01) : 599 - 610
  • [29] Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in video
    Xu, Feiyi
    Wang, Jifan
    Sun, Ying
    Qi, Jin
    Dong, Zhenjiang
    Sun, Yanfei
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [30] Efficient Hierarchical Multi-view Fusion Transformer for 3D Human Pose Estimation
    Zhou, Kangkang
    Zhang, Lijun
    Lu, Feng
    Zhou, Xiang-Dong
    Shi, Yu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7512 - 7520