Multi-hop graph transformer network for 3D human pose estimation

被引:4
|
作者
Islam, Zaedul [1 ]
Ben Hamza, A. [1 ]
机构
[1] Concordia Univ, Concordia Inst Informat Syst Engn, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
3D human pose estimation; Graph convolutional network; Transformer; Multi-hop; Dilated convolution;
D O I
10.1016/j.jvcir.2024.104174
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Accurate 3D human pose estimation is a challenging task due to occlusion and depth ambiguity. In this paper, we introduce a multi -hop graph transformer network designed for 2D -to -3D human pose estimation in videos by leveraging the strengths of multi-head self-attention and multi -hop graph convolutional networks with disentangled neighborhoods to capture spatio-temporal dependencies and handle long-range interactions. The proposed network architecture consists of a graph attention block composed of stacked layers of multi-head self-attention and graph convolution with learnable adjacency matrix, and a multi -hop graph convolutional block comprised of multi -hop convolutional and dilated convolutional layers. The combination of multi-head self-attention and multi -hop graph convolutional layers enables the model to capture both local and global dependencies, while the integration of dilated convolutional layers enhances the model's ability to handle spatial details required for accurate localization of the human body joints. Extensive experiments demonstrate the effectiveness and generalization ability of our model, achieving competitive performance on benchmark datasets.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] MGAPoseNet: multiscale graph-attention for 3D human pose estimation
    Liu, Minghao
    Wang, Wenshan
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 5589 - 5597
  • [22] DBMHT: A double-branch multi-hypothesis transformer for 3D human pose estimation in video
    Xiang, Xuezhi
    Li, Xiaoheng
    Bao, Weijie
    Qiaoa, Yulong
    El Saddik, Abdulmotaleb
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [23] STRFormer: Spatial-Temporal-ReTemporal Transformer for 3D human pose estimation
    Liu, Xing
    Tang, Hao
    IMAGE AND VISION COMPUTING, 2023, 140
  • [24] FMR-GNet: Forward Mix-Hop Spatial-Temporal Residual Graph Network for 3D Pose Estimation
    Yang, Honghong
    Liu, Hongxi
    Zhang, Yumei
    Wu, Xiaojun
    CHINESE JOURNAL OF ELECTRONICS, 2024, 33 (06) : 1346 - 1359
  • [25] Frame-Padded Multiscale Transformer for Monocular 3D Human Pose Estimation
    Zhong, Yuanhong
    Yang, Guangxia
    Zhong, Daidi
    Yang, Xun
    Wang, Shanshan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6191 - 6201
  • [26] Hierarchical Spatial-Temporal Adaptive Graph Fusion for Monocular 3D Human Pose Estimation
    Zhang, Lijun
    Lu, Feng
    Zhou, Kangkang
    Zhou, Xiang-Dong
    Shi, Yu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 61 - 65
  • [27] SlowFastFormer for 3D human pose estimation
    Zhou, Lu
    Chen, Yingying
    Wang, Jinqiao
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 243
  • [28] GraphMLP: A graph MLP-like architecture for 3D human pose estimation
    Li, Wenhao
    Liu, Mengyuan
    Liu, Hong
    Guo, Tianyu
    Wang, Ti
    Tang, Hao
    Sebe, Nicu
    PATTERN RECOGNITION, 2025, 158
  • [29] Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in video
    Xu, Feiyi
    Wang, Jifan
    Sun, Ying
    Qi, Jin
    Dong, Zhenjiang
    Sun, Yanfei
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [30] PROGRESSIVE MULTI-VIEW FUSION FOR 3D HUMAN POSE ESTIMATION
    Zhang, Lijun
    Zhou, Kangkang
    Liu, Liangchen
    Li, Zhenghao
    Zhao, Xunyi
    Zhou, Xiang-Dong
    Shi, Yu
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1600 - 1604