InferTrans: Hierarchical structural fusion transformer for crowded human pose estimation

被引:0
|
作者
Li, Muyu [1 ,2 ]
Wang, Yingfeng [4 ]
Hu, Henan [3 ]
Zhao, Xudong [1 ,2 ]
机构
[1] Dalian Univ Technol, Inst Intelligent Sci & Technol, Sch Control Sci & Engn, Dalian 116024, Liaoning, Peoples R China
[2] Dalian Univ Technol, Key Lab Intelligent Control & Optimizat Ind Equipm, Minist Educ, Dalian 116024, Liaoning, Peoples R China
[3] Dalian Jiaotong Univ, Sch Mech Engn, Dalian 116028, Liaoning, Peoples R China
[4] Ctr Intelligent Multidimens Data Anal, Hong Kong Sci Pk, Hong Kong, Peoples R China
关键词
Human pose estimation; Occlusion handling; Transformer;
D O I
10.1016/j.inffus.2024.102878
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human pose estimation in crowded scenes presents unique challenges due to frequent occlusions and complex interactions between individuals. To address these issues, we introduce InferTrans, a hierarchical structural fusion Transformer designed to improve crowded human pose estimation. InferTrans integrates semantic features into structural information using a hierarchical joint-limb-semantic fusion module. By reorganizing joints and limbs into a tree structure, the fusion module facilitates effective information exchange across different structural levels, and leverage both global structural information and local contextual details. Furthermore, we explicitly model limb structural patterns separately from joints, treating limbs as vectors with defined lengths and orientations. This allows our model to infer complete human poses from minimal input, significantly enhancing pose refinement tasks. Extensive experiments on multiple datasets demonstrate that InferTrans outperforms existing pose estimation techniques in crowded and occluded scenarios. The proposed InferTrans serves as a robust post-processing technique, and is capable of improving the accuracy and robustness of pose estimation in challenging environments.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] 3D human pose estimation with multi-hypotheses gated transformer
    Dong, Xiena
    Zhang, Jian
    Yu, Jun
    Yu, Ting
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [42] A Multi-Channel Parallel Keypoint Fusion Framework for Human Pose Estimation
    Wang, Xilong
    Shi, Nianfeng
    Wang, Guoqiang
    Shao, Jie
    Zhao, Shuaibo
    ELECTRONICS, 2023, 12 (19)
  • [43] Exploring Rare Pose in Human Pose Estimation
    Hwang, Jihye
    Yang, John
    Kwak, Nojun
    IEEE ACCESS, 2020, 8 : 194964 - 194977
  • [44] Lightweight Cross-Fusion Network on Human Pose Estimation for Edge Device
    Zhu, Xian
    Zeng, Xiaoqin
    Ma, Wei
    IEEE ACCESS, 2023, 11 : 134899 - 134907
  • [45] Intersection-Over-Union Similarity-Based Nonmaximum Suppression for Human Pose Estimation in Crowded Scenes
    Wei, Longsheng
    Huang, Hao
    Yu, Xuefu
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (02) : 511 - 520
  • [46] Human pose estimation in complex background videos via Transformer-based multi-scale feature integration
    Cheng, Chen
    Xu, Huahu
    DISPLAYS, 2024, 84
  • [47] Multi-hop graph transformer network for 3D human pose estimation
    Islam, Zaedul
    Ben Hamza, A.
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 101
  • [48] Mobile-friendly and multi-feature aggregation via transformer for human pose estimation
    Li, Biao
    Tang, Shoufeng
    Li, Wenyi
    IMAGE AND VISION COMPUTING, 2025, 153
  • [49] STRFormer: Spatial-Temporal-ReTemporal Transformer for 3D human pose estimation
    Liu, Xing
    Tang, Hao
    IMAGE AND VISION COMPUTING, 2023, 140
  • [50] Frame-Padded Multiscale Transformer for Monocular 3D Human Pose Estimation
    Zhong, Yuanhong
    Yang, Guangxia
    Zhong, Daidi
    Yang, Xun
    Wang, Shanshan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6191 - 6201