U-shaped spatial-temporal transformer network for 3D human pose estimation

被引:7
|
作者
Yang, Honghong [1 ,2 ]
Guo, Longfei [3 ]
Zhang, Yumei [1 ,3 ]
Wu, Xiaojun [1 ,3 ]
机构
[1] Shaanxi Normal Univ, Key Lab Modern Teaching Technol, Minist Educ, Xian 710062, Peoples R China
[2] Minist Culture & Tourism, Key Lab Intelligent Comp & Serv Technol Folk Song, Xian, Peoples R China
[3] Shaanxi Normal Univ, Sch Comp Sci, Xian 710062, Peoples R China
基金
中国国家自然科学基金;
关键词
Human pose estimation; Spatial-temporal transformer network; Multi-scale and multi-level feature representations; NEURAL-NETWORKS;
D O I
10.1007/s00138-022-01334-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D human pose estimation has achieved much progress with the development of convolution neural networks. There still have some challenges to accurately estimate 3D joint locations from single-view images or videos due to depth ambiguity and severe occlusion. Motivated by the effectiveness of introducing vision transformer into computer vision tasks, we present a novel U-shaped spatial-temporal transformer-based network (U-STN) for 3D human pose estimation. The core idea of the proposed method is to process the human joints by designing a multi-scale and multi-level U-shaped transformer model. We construct a multi-scale architecture with three different scales based on the human skeletal topology, in which the local and global features are processed through three different scales with kinematic constraints. Furthermore, a multi-level feature representations is introduced by fusing intermediate features from different depths of the U-shaped network. With a skeletal constrained pooling and unpooling operations devised for U-STN, the network can transform features across different scales and extract meaningful semantic features at all levels. Experiments on two challenging benchmark datasets show that the proposed method achieves a good performance on 2D-to-3D pose estimation.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] U-shaped spatial–temporal transformer network for 3D human pose estimation
    Honghong Yang
    Longfei Guo
    Yumei Zhang
    Xiaojun Wu
    Machine Vision and Applications, 2022, 33
  • [2] Multi-scale spatial-temporal transformer for 3D human pose estimation
    Wu, Yongpeng
    Gao, Junna
    2021 5TH INTERNATIONAL CONFERENCE ON VISION, IMAGE AND SIGNAL PROCESSING (ICVISP 2021), 2021, : 242 - 247
  • [3] 3D Human Pose Estimation in Video with Temporal and Spatial Transformer
    Peng, Sha
    Hu, Jiwei
    Proceedings of SPIE - The International Society for Optical Engineering, 2023, 12707
  • [4] Vertex position estimation with spatial-temporal transformer for 3D human reconstruction
    Zhang, Xiangjun
    Zheng, Yinglin
    Deng, Wenjin
    Dai, Qifeng
    Lin, Yuxin
    Shi, Wangzheng
    Zeng, Ming
    GRAPHICAL MODELS, 2023, 130
  • [5] STRFormer: Spatial-Temporal-ReTemporal Transformer for 3D human pose estimation
    Liu, Xing
    Tang, Hao
    IMAGE AND VISION COMPUTING, 2023, 140
  • [6] Weakly-Supervised 3D Human Pose Estimation With Cross-View U-Shaped Graph Convolutional Network
    Hua, Guoliang
    Liu, Hong
    Li, Wenhao
    Zhang, Qian
    Ding, Runwei
    Xu, Xin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1832 - 1843
  • [7] Kinematics-aware spatial-temporal feature transform for 3D human pose estimation
    Du, Songlin
    Yuan, Zhiwei
    Ikenaga, Takeshi
    PATTERN RECOGNITION, 2024, 150
  • [8] Hierarchical Spatial-Temporal Adaptive Graph Fusion for Monocular 3D Human Pose Estimation
    Zhang, Lijun
    Lu, Feng
    Zhou, Kangkang
    Zhou, Xiang-Dong
    Shi, Yu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 61 - 65
  • [9] 3D Human Pose Estimation with Spatial and Temporal Transformers
    Zheng, Ce
    Zhu, Sijie
    Mendieta, Matias
    Yang, Taojiannan
    Chen, Chen
    Ding, Zhengming
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11636 - 11645
  • [10] UViT: Efficient and lightweight U-shaped hybrid vision transformer for human pose estimation
    Li B.
    Tang S.
    Li W.
    Journal of Intelligent and Fuzzy Systems, 2024, 46 (04): : 8345 - 8359