InferTrans: Hierarchical structural fusion transformer for crowded human pose estimation

被引：0

作者：

Li, Muyu ^{[1
,2
]}

Wang, Yingfeng ^{[4
]}

Hu, Henan ^{[3
]}

Zhao, Xudong ^{[1
,2
]}

机构：

[1] Dalian Univ Technol, Inst Intelligent Sci & Technol, Sch Control Sci & Engn, Dalian 116024, Liaoning, Peoples R China

[2] Dalian Univ Technol, Key Lab Intelligent Control & Optimizat Ind Equipm, Minist Educ, Dalian 116024, Liaoning, Peoples R China

[3] Dalian Jiaotong Univ, Sch Mech Engn, Dalian 116028, Liaoning, Peoples R China

[4] Ctr Intelligent Multidimens Data Anal, Hong Kong Sci Pk, Hong Kong, Peoples R China

来源：

INFORMATION FUSION | 2025年 / 117卷

关键词：

Human pose estimation; Occlusion handling; Transformer;

D O I：

10.1016/j.inffus.2024.102878

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human pose estimation in crowded scenes presents unique challenges due to frequent occlusions and complex interactions between individuals. To address these issues, we introduce InferTrans, a hierarchical structural fusion Transformer designed to improve crowded human pose estimation. InferTrans integrates semantic features into structural information using a hierarchical joint-limb-semantic fusion module. By reorganizing joints and limbs into a tree structure, the fusion module facilitates effective information exchange across different structural levels, and leverage both global structural information and local contextual details. Furthermore, we explicitly model limb structural patterns separately from joints, treating limbs as vectors with defined lengths and orientations. This allows our model to infer complete human poses from minimal input, significantly enhancing pose refinement tasks. Extensive experiments on multiple datasets demonstrate that InferTrans outperforms existing pose estimation techniques in crowded and occluded scenarios. The proposed InferTrans serves as a robust post-processing technique, and is capable of improving the accuracy and robustness of pose estimation in challenging environments.

引用

页数：14

共 50 条

[41] 3D human pose estimation with multi-hypotheses gated transformer
Dong, Xiena
Zhang, Jian
Yu, Jun
Yu, Ting
MULTIMEDIA SYSTEMS, 2024, 30 (06)
[42] A Multi-Channel Parallel Keypoint Fusion Framework for Human Pose Estimation
Wang, Xilong
Shi, Nianfeng
Wang, Guoqiang
Shao, Jie
Zhao, Shuaibo
ELECTRONICS, 2023, 12 (19)
[43] Exploring Rare Pose in Human Pose Estimation
Hwang, Jihye
Yang, John
Kwak, Nojun
IEEE ACCESS, 2020, 8 : 194964 - 194977
[44] Lightweight Cross-Fusion Network on Human Pose Estimation for Edge Device
Zhu, Xian
Zeng, Xiaoqin
Ma, Wei
IEEE ACCESS, 2023, 11 : 134899 - 134907
[45] Intersection-Over-Union Similarity-Based Nonmaximum Suppression for Human Pose Estimation in Crowded Scenes
Wei, Longsheng
Huang, Hao
Yu, Xuefu
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (02) : 511 - 520
[46] Human pose estimation in complex background videos via Transformer-based multi-scale feature integration
Cheng, Chen
Xu, Huahu
DISPLAYS, 2024, 84
[47] Multi-hop graph transformer network for 3D human pose estimation
Islam, Zaedul
Ben Hamza, A.
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 101
[48] Mobile-friendly and multi-feature aggregation via transformer for human pose estimation
Li, Biao
Tang, Shoufeng
Li, Wenyi
IMAGE AND VISION COMPUTING, 2025, 153
[49] STRFormer: Spatial-Temporal-ReTemporal Transformer for 3D human pose estimation
Liu, Xing
Tang, Hao
IMAGE AND VISION COMPUTING, 2023, 140
[50] Frame-Padded Multiscale Transformer for Monocular 3D Human Pose Estimation
Zhong, Yuanhong
Yang, Guangxia
Zhong, Daidi
Yang, Xun
Wang, Shanshan
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6191 - 6201

← 1 2 3 4 5 →