WideHRNet: An Efficient Model for Human Pose Estimation Using Wide Channels in Lightweight High-Resolution Network

被引:0
作者
Samkari, Esraa [1 ]
Arif, Muhammad [1 ]
AlGhamdi, Manal [1 ]
Al Ghamdi, Mohammed A. [1 ]
机构
[1] Umm Al Qura Univ, Dept Comp Sci & Artificial Intelligence, Mecca 21955, Saudi Arabia
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Convolution neural network; efficient network; human pose estimation; wide network;
D O I
10.1109/ACCESS.2024.3476196
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human pose estimation is a task that involves locating the body joints in an image. Current deep learning models accurately estimate the locations of these joints. However, they struggle with smaller joints, such as the wrist and ankle, leading to lower accuracy. To address this problem, current models add more layers and make the model deeper to achieve higher accuracy. However, this solution adds complexity to the model. Therefore, we present an efficient network that can estimate small joints by capturing more features by increasing the network's channels. Our network structure follows multiple stages and multiple branches while maintaining high-resolution output along the network. Hence, we called this network Wide High-Resolution Network (WideHRNet). WideHRNet provides several advantages. First, it runs in parallel and provides a high-resolution output. Second, unlike heavyweight networks, WideHRNet obtains superior results using a few layers. Third, the complexity of WideHRNet can be controlled by adjusting the hyperparameter of expansion channels. Fourth, the performance of WideHRNet is further enhanced by adding the attention mechanism. Experimental results on the MPII dataset show that the WideHRNet outperforms state-of-the-art efficient models, achieving 88.47% with the attention block.
引用
收藏
页码:148990 / 149000
页数:11
相关论文
共 51 条
[1]   Hybrid Classifiers for Spatio-Temporal Abnormal Behavior Detection, Tracking, and Recognition in Massive Hajj Crowds [J].
Alafif, Tarik ;
Hadi, Anas ;
Allahyani, Manal ;
Alzahrani, Bander ;
Alhothali, Areej ;
Alotaibi, Reem ;
Barnawi, Ahmed .
ELECTRONICS, 2023, 12 (05)
[2]   Classification of the Human Protein Atlas Single Cell Using Deep Learning [J].
Alsubait, Tahani ;
Sindi, Taghreed ;
Alhakami, Hosam .
APPLIED SCIENCES-BASEL, 2022, 12 (22)
[3]   2D Human Pose Estimation: New Benchmark and State of the Art Analysis [J].
Andriluka, Mykhaylo ;
Pishchulin, Leonid ;
Gehler, Peter ;
Schiele, Bernt .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3686-3693
[4]   Multiscale spatial temporal attention graph convolution network for skeleton-based anomaly behavior detection [J].
Chen, Xiaoyu ;
Kan, Shichao ;
Zhang, Fanghui ;
Cen, Yigang ;
Zhang, Linna ;
Zhang, Damin .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 90
[5]  
Cheng Heng-Tze, 2016, 1 WORKSH DEEP LEARN, P7, DOI DOI 10.1145/2988450.2988454
[6]  
Dosovitskiy A, 2021, INT C LEARN REPR
[7]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[8]   Attention mechanisms in computer vision: A survey [J].
Guo, Meng-Hao ;
Xu, Tian-Xing ;
Liu, Jiang-Jiang ;
Liu, Zheng-Ning ;
Jiang, Peng-Tao ;
Mu, Tai-Jiang ;
Zhang, Song-Hai ;
Martin, Ralph R. ;
Cheng, Ming-Ming ;
Hu, Shi-Min .
COMPUTATIONAL VISUAL MEDIA, 2022, 8 (03) :331-368
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]  
Howard A. G., 2017, ARXIV170404861