A cross-feature interaction network for 3D human pose estimation

被引:0
作者
Peng, Jihua [1 ]
Zhou, Yanghong [3 ]
Mok, P. Y. [1 ,2 ,4 ,5 ]
机构
[1] Hong Kong Polytech Univ, Sch Fash & Text, Hong Kong, Peoples R China
[2] Lab Artificial Intelligence Design, Hong Kong, Peoples R China
[3] Hong Kong Polytech Univ, Res Ctr Text Future Fash, Hong Kong, Peoples R China
[4] Hong Kong Polytech Univ, Res Inst Sports Sci & Technol, Hong Kong, Peoples R China
[5] Hong Kong Univ Sci & Technol, Div Integrat Syst & Design, Hong Kong, Peoples R China
关键词
3D human pose estimation; graph convolutional network (GCN); self-attention; cross-attention;
D O I
10.1016/j.patrec.2025.01.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of estimating 3D human poses from single monocular images is challenging because, unlike video sequences, single images can hardly provide any temporal information for the prediction. Most existing methods attempt to predict 3D poses by modeling the spatial dependencies inherent in the anatomical structure of the human skeleton, yet these methods fail to capture the complex local and global relationships that exist among various joints. To solve this problem, we propose a novel Cross-Feature Interaction Network to effectively model spatial correlations between body joints. Specifically, we exploit graph convolutional networks (GCNs) to learn the local features between neighboring joints and the self-attention structure to learn the global features among all joints. We then design a cross-feature interaction (CFI) module to facilitate cross-feature communications among the three different features, namely the local features, global features, and initial 2D pose features, aggregating them to form enhanced spatial representations of human pose. Furthermore, a novel graph-enhanced module (GraMLP) with parallel GCN and multi-layer perceptron is introduced to inject the skeletal knowledge of the human body into the final representation of 3D pose. Extensive experiments on two datasets (Human3.6M (Ionescu et al., 2013) and MPI-INF-3DHP (Mehta et al., 2017)) show the superior performance of our method in comparison to existing state-of-the-art (SOTA) models. The code and data are shared at https://github.com/JihuaPeng/CFI-3DHPE
引用
收藏
页码:175 / 181
页数:7
相关论文
共 44 条
[31]   SACANet: end-to-end self-attention-based network for 3D clothing animation [J].
Chen, Yunxi ;
Cao, Yuanjie ;
Fang, Fei ;
Huang, Jin ;
Hu, Xinrong ;
He, Ruhan ;
Zhang, Junjie .
VISUAL COMPUTER, 2024, :3829-3842
[32]   SC3D: Semantic-guided and Class-adaptive cross-domain fusion for 3D object detection in autonomous vehicles [J].
Mushtaq, Husnain ;
Deng, Xiaoheng ;
Alizadehsani, Roohallah ;
Iqbal, Muhammad Shahid ;
Khan, Tamoor ;
Abbasi, Adeel Ahmed .
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 268
[33]   Enhancing Few-Shot 3D Point Cloud Classification With Soft Interaction and Self-Attention [J].
Khan, Abdullah Aman ;
Shao, Jie ;
Shafiq, Sidra ;
Zhu, Shuyuan ;
Shen, Heng Tao .
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 :1127-1141
[34]   DFA-SAT: Dynamic Feature Abstraction with Self-Attention-Based 3D Object Detection for Autonomous Driving [J].
Mushtaq, Husnain ;
Deng, Xiaoheng ;
Ali, Mubashir ;
Hayat, Babur ;
Raza Sherazi, Hafiz Husnain .
SUSTAINABILITY, 2023, 15 (18)
[35]   Hash Self-Attention End-to-End Network for Sketch-Based 3D Shape Retrieval [J].
Zhao X. ;
Pan X. ;
Liu F. ;
Zhang S. .
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2021, 33 (05) :798-805
[36]   CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection [J].
Hwang, Jyh-Jing ;
Kretzschmar, Henrik ;
Manela, Joshua ;
Rafferty, Sean ;
Armstrong-Crews, Nicholas ;
Chen, Tiffany ;
Anguelov, Dragomir .
COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 :388-405
[37]   3D lymphoma segmentation on PET/CT images via multi-scale information fusion with cross-attention [J].
Huang, Huan ;
Qiu, Liheng ;
Yang, Shenmiao ;
Li, Longxi ;
Nan, Jiaofen ;
Li, Yanting ;
Han, Chuang ;
Zhu, Fubao ;
Zhao, Chen ;
Zhou, Weihua .
MEDICAL PHYSICS, 2025, 52 (06) :4371-4389
[38]   A Self-Attentive Hybrid Coding Network for 3D Change Detection in High-Resolution Optical Stereo Images [J].
Pan, Jianping ;
Li, Xin ;
Cai, Zhuoyan ;
Sun, Bowen ;
Cui, Wei .
REMOTE SENSING, 2022, 14 (09)
[39]   AMS-Net: An Attention-Based Multi-Scale Network for Classification of 3D Terracotta Warrior Fragments [J].
Liu, Jie ;
Cao, Xin ;
Zhang, Pingchuan ;
Xu, Xueli ;
Liu, Yangyang ;
Geng, Guohua ;
Zhao, Fengjun ;
Li, Kang ;
Zhou, Mingquan .
REMOTE SENSING, 2021, 13 (18)
[40]   GFA-SMT: Geometric Feature Aggregation and Self-Attention in a Multi-Head Transformer for 3D Object Detection in Autonomous Vehicles [J].
Mushtaq, Husnain ;
Deng, Xiaoheng ;
Jiang, Ping ;
Wan, Shaohua ;
Ali, Mubashir ;
Ullah, Irshad .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (03) :3557-3573