Hierarchical Spatial-Temporal Adaptive Graph Fusion for Monocular 3D Human Pose Estimation

被引:2
|
作者
Zhang, Lijun [1 ,2 ]
Lu, Feng [3 ,4 ]
Zhou, Kangkang [1 ,2 ]
Zhou, Xiang-Dong [1 ,2 ]
Shi, Yu [1 ,2 ]
机构
[1] Chinese Acad Sci, Chongqing Inst Green & Intelligent Technol, Chongqing 400714, Peoples R China
[2] Univ Chinese Acad Sci, Chongqing Sch, Chongqing 400714, Peoples R China
[3] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing 100190, Peoples R China
[4] Peng Cheng Lab, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
3D human pose estimation; attention mechanism; graph convolutional network; spatial-temporal fusion; TRANSFORMER; NETWORK;
D O I
10.1109/LSP.2023.3339060
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Single-view 3D human pose estimation (HPE) based on Graph Convolutional Networks currently suffers from problems such as insufficient feature representation and depth ambiguity. To address these issues, this letter proposes a hierarchical spatial-temporal adaptive graph fusion framework to improve monocular 3D HPE performance. Firstly, to enhance the spatial semantic feature representation of human nodes, a progressive adaptive graph feature capture strategy is developed, which adaptively constructs global-to-local attention graph features of all human joints in a coarse-to-fine manner. A spatial-temporal attention fusion module is then constructed to model long-term sequential dependencies and mitigate depth ambiguity. The temporal attention factors of related frames are captured and utilized to intermediately supervise the joint depth. The spatial semantic information among all joints in the same frame and temporal contextual knowledge of the joints across relevant frames are fused to build spatial-temporal correlations and optimize the final features. Extensive experiments on two popular benchmarks show that our method outperforms several state-of-the-art approaches and improves 3D HPE performance.
引用
收藏
页码:61 / 65
页数:5
相关论文
共 50 条
  • [1] SPATIO-TEMPORAL ATTENTION GRAPH FOR MONOCULAR 3D HUMAN POSE ESTIMATION
    Zhang, Lijun
    Shao, Xiaohu
    Li, Zhenghao
    Zhou, Xiang-Dong
    Shi, Yu
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1231 - 1235
  • [2] Exploiting Spatial-temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks
    Cai, Yujun
    Ge, Liuhao
    Liu, Jun
    Cai, Jianfei
    Cham, Tat-Jen
    Yuan, Junsong
    Thalmann, Nadia Magnenat
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2272 - 2281
  • [3] Multi-scale spatial-temporal transformer for 3D human pose estimation
    Wu, Yongpeng
    Gao, Junna
    2021 5TH INTERNATIONAL CONFERENCE ON VISION, IMAGE AND SIGNAL PROCESSING (ICVISP 2021), 2021, : 242 - 247
  • [4] On the Effect of Temporal Information on Monocular 3D Human Pose Estimation
    Brauer, Juergen
    Gong, Wenjuan
    Gonzalez, Jordi
    Arens, Michael
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
  • [5] Personalized Graph Generation for Monocular 3D Human Pose and Shape Estimation
    Hu, Junxing
    Zhang, Hongwen
    Wang, Yunlong
    Ren, Min
    Sun, Zhenan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2399 - 2413
  • [6] Kinematics-aware spatial-temporal feature transform for 3D human pose estimation
    Du, Songlin
    Yuan, Zhiwei
    Ikenaga, Takeshi
    PATTERN RECOGNITION, 2024, 150
  • [7] U-shaped spatial-temporal transformer network for 3D human pose estimation
    Yang, Honghong
    Guo, Longfei
    Zhang, Yumei
    Wu, Xiaojun
    MACHINE VISION AND APPLICATIONS, 2022, 33 (06)
  • [8] 3D Human Pose Estimation with Spatial and Temporal Transformers
    Zheng, Ce
    Zhu, Sijie
    Mendieta, Matias
    Yang, Taojiannan
    Chen, Chen
    Ding, Zhengming
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11636 - 11645
  • [9] Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation
    Honari, Sina
    Constantin, Victor
    Rhodin, Helge
    Salzmann, Mathieu
    Fua, Pascal
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 6415 - 6427
  • [10] A survey on monocular 3D human pose estimation
    Ji X.
    Fang Q.
    Dong J.
    Shuai Q.
    Jiang W.
    Zhou X.
    Virtual Reality and Intelligent Hardware, 2020, 2 (06): : 471 - 500