Mobile-friendly and multi-feature aggregation via transformer for human pose estimation

被引:0
作者
Li, Biao [1 ,2 ]
Tang, Shoufeng [1 ]
Li, Wenyi [1 ,2 ]
机构
[1] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Peoples R China
[2] Suzhou Univ, Sch Mech & Elect Engn, Suzhou 234000, Peoples R China
关键词
Human pose estimation; Lightweight network; Multi-feature aggregation; Hybrid architecture;
D O I
10.1016/j.imavis.2024.105343
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human pose estimation is pivotal for human-centric visual tasks, yet deploying such models on mobile devices remains challenging due to high parameter counts and computational demands. In this paper, we study Mobile-Friendly and Multi-Feature Aggregation architectural designs for human pose estimation and propose a novel model called MobileMultiPose. Specifically, a lightweight aggregation method, incorporating multi- scale and multi-feature, mitigates redundant shallow semantic extraction and local deep semantic constraints. To efficiently aggregate diverse local and global features, a lightweight transformer module, constructed from a self-attention mechanism with linear complexity, is designed, achieving deep fusion of shallow and deep semantics. Furthermore, a multi-scale loss supervision method is incorporated into the training process to enhance model performance, facilitating the effective fusion of edge information across various scales. Extensive experiments show that the smallest variant of MobileMultiPose outperforms lightweight models (MobileNetv2, ShuffleNetv2, and Small HRNet) by 0.7, 5.4, and 10.1 points, respectively, on the COCO validation set, with fewer parameters and FLOPs. In particular, the largest MobileMultiPose variant achieves an impressive AP score of 72.4 on the COCO test-dev set, notably, its parameters and FLOPs are only 16% and 18% of HRNet-W32, and 7% and 9% of DARK, respectively. We aim to offer novel insights into designing lightweight and efficient feature extraction networks, supporting mobile-friendly model deployment.
引用
收藏
页数:12
相关论文
共 75 条
[31]  
Li K., 2022, ICLR 2022
[32]   Pose Recognition with Cascade Transformers [J].
Li, Ke ;
Wang, Shijie ;
Zhang, Xiang ;
Xu, Yifan ;
Xu, Weijian ;
Tu, Zhuowen .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :1944-1953
[33]   Selective Kernel Networks [J].
Li, Xiang ;
Wang, Wenhai ;
Hu, Xiaolin ;
Yang, Jian .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :510-519
[34]   SimCC: A Simple Coordinate Classification Perspective for Human Pose Estimation [J].
Li, Yanjie ;
Yang, Sen ;
Liu, Peidong ;
Zhang, Shoukui ;
Wang, Yunxiao ;
Wang, Zhicheng ;
Yang, Wankou ;
Xia, Shu-Tao .
COMPUTER VISION - ECCV 2022, PT VI, 2022, 13666 :89-106
[35]   TokenPose: Learning Keypoint Tokens for Human Pose Estimation [J].
Li, Yanjie ;
Zhang, Shoukui ;
Wang, Zhicheng ;
Yang, Sen ;
Yang, Wankou ;
Xia, Shu-Tao ;
Zhou, Erjin .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :11293-11302
[36]  
Lin H., 2022, PROC IEEE INT C MULT
[37]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755
[38]   Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [J].
Liu, Ze ;
Lin, Yutong ;
Cao, Yue ;
Hu, Han ;
Wei, Yixuan ;
Zhang, Zheng ;
Lin, Stephen ;
Guo, Baining .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9992-10002
[39]   Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation [J].
Luo, Zhengxiong ;
Wang, Zhicheng ;
Huang, Yan ;
Wang, Liang ;
Tan, Tieniu ;
Zhou, Erjin .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13259-13268
[40]   ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design [J].
Ma, Ningning ;
Zhang, Xiangyu ;
Zheng, Hai-Tao ;
Sun, Jian .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :122-138