AiPE: A Novel Transformer-Based Pose Estimation Method

被引:0
作者
Lu, Kai [1 ]
Min, Dugki [1 ]
机构
[1] Konkuk Univ, Coll Engn, Dept Comp Sci & Engn, Seoul 05029, South Korea
关键词
pose estimation; transformer model; attention; computer vision;
D O I
10.3390/electronics13050967
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human pose estimation is an important problem in computer vision because it is the foundation for many advanced semantic tasks and downstream applications. Although some convolutional neural network-based pose estimation methods have achieved good results, these networks are still limited for restricted receptive fields and weak robustness, leading to poor detection performance in scenarios with blur or low resolution. Additionally, their highly parallelized strategy is likely to cause significant computational demands, requiring high computing power. In comparison to the convolutional neural networks, the transformer-based methods offer advantages such as flexible stacking, global perspective, and parallel computation. Based on the great benefits, a novel transformer-based human pose estimation method is developed, which employees multi-head self-attention mechanisms and offset windows to effectively suppress the quick growth of the computational complexity near human keypoints. Experimental results under detailed visual comparison and quantitative analysis demonstrate that the proposed method can efficiently deal with the pose estimation problem in challenging scenarios, such as blurry or occluded scenes. Furthermore, the errors in human skeleton mapping caused by keypoint occlusion or omission can be effectively corrected, so the accuracy of pose estimation results is greatly improved.
引用
收藏
页数:16
相关论文
共 33 条
[1]  
Bafana S, 2023, Arxiv, DOI arXiv:2312.06914
[2]  
Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, 10.48550/arXiv.2010.11929, DOI 10.48550/ARXIV.2010.11929]
[3]   RMPE: Regional Multi-Person Pose Estimation [J].
Fang, Hao-Shu ;
Xie, Shuqin ;
Tai, Yu-Wing ;
Lu, Cewu .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2353-2362
[4]   A Survey on Vision Transformer [J].
Han, Kai ;
Wang, Yunhe ;
Chen, Hanting ;
Chen, Xinghao ;
Guo, Jianyuan ;
Liu, Zhenhua ;
Tang, Yehui ;
Xiao, An ;
Xu, Chunjing ;
Xu, Yixing ;
Yang, Zhaohui ;
Zhang, Yiman ;
Tao, Dacheng .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) :87-110
[5]  
Han XY, 2022, Arxiv, DOI arXiv:2106.02073
[6]   Fast Landmark Localization With 3D Component Reconstruction and CNN for Cross-Pose Recognition [J].
Hsu, Gee-Sern ;
Shie, Hung-Cheng ;
Hsieh, Cheng-Hua ;
Chan, Jui-Shan .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (11) :3194-3207
[7]  
Hung WC, 2024, Arxiv, DOI arXiv:2206.07705
[8]  
Jin S, 2020, Img Proc Comp Vis Re, V12354, P196, DOI 10.1007/978-3-030-58545-7_12
[9]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[10]   CMU DeepLens: deep learning for automatic image-based galaxy-galaxy strong lens finding [J].
Lanusse, Francois ;
Ma, Quanbin ;
Li, Nan ;
Collett, Thomas E. ;
Li, Chun-Liang ;
Ravanbakhsh, Siamak ;
Mandelbaum, Rachel ;
Poczos, Barnabas .
MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2018, 473 (03) :3895-3906