ABYOLOv4: improved YOLOv4 human object detection based on enhanced multi-scale feature fusion

被引:5
作者
Li, Rui [1 ,2 ]
Zeng, Xin [1 ]
Yang, Shiqiang [1 ]
Li, Qi [1 ]
Yan, An [1 ]
Li, Dexin [1 ]
机构
[1] Xian Univ Technol, Sch Mech & Precis Instrument Engn, Xian 710048, Peoples R China
[2] Xian Peoples Hosp, Xian, Peoples R China
关键词
Deep learning; Human object detection; YOLOv4; ASPP; Bi-FPN; NETWORKS;
D O I
10.1186/s13634-023-01105-z
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The purpose of human object detection is to obtain the number of people and their position in images, which is one of the core problems in the field of machine vision. However, the high missing detection rate from small- and medium-sized human bodies due to the large variety of human scale in human object detection tasks still influences the performance of human object detection. To solve the above problem, this paper proposed an improved ASPP_BiFPN_YOLOv4 (ABYOLOv4) method to detect human object detection. In detail, Atrous Spatial Pyramid Pooling (ASPP) module was used to replace the original Spatial Pyramid Pooling module to increase the receptive field level of the network and improve the perception ability of multi-scale targets. Then, the original Path Aggregation Network (PANet) multi-scale fusion module was replaced by the self-built bi-layer bidirectional feature pyramid network (Bi-FPN). Meanwhile, a new feature was imported into the proposed model to reuse the mid- and low-level features, which could enhance the ability of the network to express the characteristics of small- and medium-sized targets. Finally, the standard convolution in Bi-FPN was replaced by depth-separable convolution to make the network achieve the balance of accuracy and the number of parameters. To identify the performance of the proposed ABYOLOv4 model, the human object detection experiment is carried out by using the public data set of VOC2007 and VOC2012, the improved YOLOv4 algorithm is 0.5% higher than the original AP algorithm, and the weight file size of the model is reduced by 45.3 M. The experimental results demonstrated that the proposed ABYOLOv4 network has higher accuracy and lower computational cost for human target detection.
引用
收藏
页数:16
相关论文
共 25 条
[1]  
Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, 10.48550/arXiv.2004.10934, DOI 10.48550/ARXIV.2004.10934]
[2]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[3]   Instance-aware Semantic Segmentation via Multi-task Network Cascades [J].
Dai, Jifeng ;
He, Kaiming ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3150-3158
[4]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[5]  
Howard AG, 2017, Arxiv, DOI [arXiv:1704.04861, 10.48550/arXiv.1704.04861]
[6]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[7]   Rich feature hierarchies for accurate object detection and semantic segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587
[8]   Fast image segmentation based on multilevel banded closed-form method [J].
Han, Shoudong ;
Tao, Wenbing ;
Wu, Xianglin ;
Tai, Xue-cheng ;
Wang, Tianjiang .
PATTERN RECOGNITION LETTERS, 2010, 31 (03) :216-225
[9]  
Hariharan B, 2015, PROC CVPR IEEE, P447, DOI 10.1109/CVPR.2015.7298642
[10]   Simultaneous Detection and Segmentation [J].
Hariharan, Bharath ;
Arbelaez, Pablo ;
Girshick, Ross ;
Malik, Jitendra .
COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :297-312