Hierarchical Spatial-Temporal Adaptive Graph Fusion for Monocular 3D Human Pose Estimation

被引：2

作者：

Zhang, Lijun ^{[1
,2
]}

Lu, Feng ^{[3
,4
]}

Zhou, Kangkang ^{[1
,2
]}

Zhou, Xiang-Dong ^{[1
,2
]}

Shi, Yu ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Chongqing Inst Green & Intelligent Technol, Chongqing 400714, Peoples R China

[2] Univ Chinese Acad Sci, Chongqing Sch, Chongqing 400714, Peoples R China

[3] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing 100190, Peoples R China

[4] Peng Cheng Lab, Shenzhen 518055, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

基金：

中国国家自然科学基金;

关键词：

3D human pose estimation; attention mechanism; graph convolutional network; spatial-temporal fusion; TRANSFORMER; NETWORK;

D O I：

10.1109/LSP.2023.3339060

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Single-view 3D human pose estimation (HPE) based on Graph Convolutional Networks currently suffers from problems such as insufficient feature representation and depth ambiguity. To address these issues, this letter proposes a hierarchical spatial-temporal adaptive graph fusion framework to improve monocular 3D HPE performance. Firstly, to enhance the spatial semantic feature representation of human nodes, a progressive adaptive graph feature capture strategy is developed, which adaptively constructs global-to-local attention graph features of all human joints in a coarse-to-fine manner. A spatial-temporal attention fusion module is then constructed to model long-term sequential dependencies and mitigate depth ambiguity. The temporal attention factors of related frames are captured and utilized to intermediately supervise the joint depth. The spatial semantic information among all joints in the same frame and temporal contextual knowledge of the joints across relevant frames are fused to build spatial-temporal correlations and optimize the final features. Extensive experiments on two popular benchmarks show that our method outperforms several state-of-the-art approaches and improves 3D HPE performance.

引用

页码：61 / 65

页数：5

共 50 条

[1] SPATIO-TEMPORAL ATTENTION GRAPH FOR MONOCULAR 3D HUMAN POSE ESTIMATION
Zhang, Lijun
Shao, Xiaohu
Li, Zhenghao
Zhou, Xiang-Dong
Shi, Yu
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1231 - 1235
[2] Exploiting Spatial-temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks
Cai, Yujun
Ge, Liuhao
Liu, Jun
Cai, Jianfei
Cham, Tat-Jen
Yuan, Junsong
Thalmann, Nadia Magnenat
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2272 - 2281
[3] Multi-scale spatial-temporal transformer for 3D human pose estimation
Wu, Yongpeng
Gao, Junna
2021 5TH INTERNATIONAL CONFERENCE ON VISION, IMAGE AND SIGNAL PROCESSING (ICVISP 2021), 2021, : 242 - 247
[4] On the Effect of Temporal Information on Monocular 3D Human Pose Estimation
Brauer, Juergen
Gong, Wenjuan
Gonzalez, Jordi
Arens, Michael
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
[5] Personalized Graph Generation for Monocular 3D Human Pose and Shape Estimation
Hu, Junxing
Zhang, Hongwen
Wang, Yunlong
Ren, Min
Sun, Zhenan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2399 - 2413
[6] Kinematics-aware spatial-temporal feature transform for 3D human pose estimation
Du, Songlin
Yuan, Zhiwei
Ikenaga, Takeshi
PATTERN RECOGNITION, 2024, 150
[7] U-shaped spatial-temporal transformer network for 3D human pose estimation
Yang, Honghong
Guo, Longfei
Zhang, Yumei
Wu, Xiaojun
MACHINE VISION AND APPLICATIONS, 2022, 33 (06)
[8] 3D Human Pose Estimation with Spatial and Temporal Transformers
Zheng, Ce
Zhu, Sijie
Mendieta, Matias
Yang, Taojiannan
Chen, Chen
Ding, Zhengming
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11636 - 11645
[9] Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation
Honari, Sina
Constantin, Victor
Rhodin, Helge
Salzmann, Mathieu
Fua, Pascal
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 6415 - 6427
[10] A survey on monocular 3D human pose estimation
Ji X.
Fang Q.
Dong J.
Shuai Q.
Jiang W.
Zhou X.
Virtual Reality and Intelligent Hardware, 2020, 2 (06): : 471 - 500

← 1 2 3 4 5 →