Small Target Detection Model in Aerial Images Based on TCA-YOLOv5m

被引：8

作者：

Huang, Min ^{[1
,2
]}

Zhang, Yiyan ^{[2
]}

Chen, Yazhou ^{[1
]}

机构：

[1] Army Engn Univ, Natl Key Lab Electromagnet Environm Effects, Shijiazhuang Campus, Shijiazhuang 050003, Peoples R China

[2] Hebei Univ Sci & Technol, Sch Informat Sci & Engn, Shijiazhuang 050018, Peoples R China

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Feature extraction; Object detection; Classification algorithms; Proposals; Prediction algorithms; Transformers; Deep learning; Aerial images; small target detection; TCA-YOLOv5m; transformer algorithm; coordinate attention; path aggregation network; LANGUAGE; TRENDS;

D O I：

10.1109/ACCESS.2022.3232293

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Target detection in aerial images taken by unmanned aerial vehicles is the most widely used scene at present. Compared with ordinary images, the background of aerial images is more complex, and the target size is smaller, which results in inferior detection precision and a high false detection rate. This paper proposes a new small target detection model TCA-YOLOv5m, which is based on YOLOv5m and combines the Transformer algorithm and the Coordinate Attention (CA) mechanism. In this model, the transformer algorithm is added to the end of the backbone of the YOLOv5, which enables the model to mine more features information of images. In the neck layer of the TCA-YOLOv5m, the Path Aggregation Network (PANet) and transformer algorithm are combined to enhance the expression capacity for the feature pyramid and improve the detection precision of occluded high-density small targets, and CA is introduced to more accurately locate targets in high-density scenes. In addition, the TCA-YOLOv5m adds a detection layer to improve the ability to capture small targets. This paper uses VisDrone 2019 as experimental data, and takes experiments to compare the detection precision and detection speed of the proposed model with baseline models. The experiment results indicate that the detection precision of the TCA-YOLOv5m reaches 97.4%, which is 5.2% higher than that of YOLOv5; the value of MAP @ 50 reaches 58.5%, which is 14.8% higher than YOLOv5. The Frames Per Second (FPS) of the TCA-YOLOv5m is 12.96 f/s, which ensures a certain real-time performance. Therefore, the TCA-YOLOv5m is suitable for the task of detecting dense small targets in aerial images.

引用

页码：3352 / 3366

页数：15

共 35 条

[1] BAYESIAN ALGORITHMS FOR ADAPTIVE CHANGE DETECTION IN IMAGE SEQUENCES USING MARKOV RANDOM-FIELDS [J].

AACH, T ;

KAUP, A .

SIGNAL PROCESSING-IMAGE COMMUNICATION, 1995, 7 (02) :147-160

[2] Exploring Deep Learning-Based Architecture, Strategies, Applications and Current Trends in Generic Object Detection: A Comprehensive Review [J].

Aziz, Lubna ;

Haji Salam, Md. Sah Bin ;

Sheikh, Usman Ullah ;

Ayub, Sara .

IEEE ACCESS, 2020, 8 :170461-170495

[3]

Bennett K. P., 2000, ICML, P57

[4]

Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, DOI 10.48550/ARXIV.2004.10934]

[5] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[6] Rich feature hierarchies for accurate object detection and semantic segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587

[7]

He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]

[8] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (09) :1904-1916

[9]

Hinton G. E., 2012, arXiv, DOI DOI 10.48550/ARXIV.1207.0580

[10] Improved YOLO V3 Algorithm and Its Application in Small Target Detection [J].

Ju Moran ;

Luo Haibo ;

Wang Zhongbo ;

He Miao ;

Chang Zheng ;

Hui Bin .

ACTA OPTICA SINICA, 2019, 39 (07)

← 1 2 3 4 →