A Lightweight Object Detector Based on Spatial-Coordinate Self-Attention for UAV Aerial Images

被引：28

作者：

Liu, Chen ^{[1
]}

Yang, Degang ^{[1
]}

Tang, Liu ^{[1
]}

Zhou, Xun ^{[2
]}

Deng, Yi ^{[1
]}

机构：

[1] Chongqing Normal Univ, Coll Comp & Informat Sci, Chongqing 401331, Peoples R China

[2] Party Sch Yibin Comm Communist Party China, Yibin 644000, Peoples R China

来源：

REMOTE SENSING | 2023年 / 15卷 / 01期

基金：

中国国家自然科学基金;

关键词：

object detection; aerial images; attention mechanism; lightweight network; unmanned aerial vehicle (UAV);

D O I：

10.3390/rs15010083

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

Object detection is one of the most widespread applications for numerous Unmanned Aerial Vehicle (UAV) tasks. Due to the shooting angle and flying height of the UAV, compared with general scenarios, small objects account for a large proportion of aerial images, and common object detectors are not extremely effective in aerial images. Moreover, since the computing resources of UAV platforms are generally limited, the deployment of common detectors with a large number of parameters on UAV platforms is difficult. This paper proposes a lightweight object detector YOLO-UAVlite for aerial images. Firstly, the spatial attention module and coordinate attention module are modified and combined to form a novel Spatial-Coordinate Self-Attention (SCSA) module, which integrates spatial, location, and channel information to enhance object representation. On this basis, we construct a lightweight backbone, named SCSAshufflenet, which combines the Enhanced ShuffleNet (ES) network with the proposed SCSA module to improve feature extraction and reduce model size. Secondly, we propose an improved feature pyramid model, namely Slim-BiFPN, where we construct new lightweight convolutional blocks to reduce the information loss during the feature map fusion process while reducing the model weights. Finally, the localization loss function is modified to increase the bounding box regression rate while improving the localization accuracy. Extensive experiments conducted on the VisDrone-DET2021 dataset indicate that, compared with the YOLOv5-N baseline, the proposed YOLO-UAVlite reduces the number of parameters by 25.8% and achieves gains of 10.9% in mAP0.50. Compared with other lightweight detectors, both the mAP and the number of parameters are improved.

引用

页数：21

共 40 条

[1]

Bochkovskiy A., 2020, YOLOv4: Optimal Speed and Accuracy of Object Detection

[2]

Chen, 2021, Zenodo, DOI 10.5281/ZENODO.5241425

[3] Context-Aware Block Net for Small Object Detection [J].

Cui, Lisha ;

Lv, Pei ;

Jiang, Xiaoheng ;

Gao, Zhimin ;

Zhou, Bing ;

Zhang, Luming ;

Shao, Ling ;

Xu, Mingliang .

IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (04) :2300-2313

[4] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[5] Rich feature hierarchies for accurate object detection and semantic segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587

[6] GhostNet: More Features from Cheap Operations [J].

Han, Kai ;

Wang, Yunhe ;

Tian, Qi ;

Guo, Jianyuan ;

Xu, Chunjing ;

Xu, Chang .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :1577-1586

[7]

He J., 2021, Adv. Neural Inf. Process. Syst, V34, P20230

[8] Mask R-CNN [J].

He, Kaiming ;

Gkioxari, Georgia ;

Dollar, Piotr ;

Girshick, Ross .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2980-2988

[9] Coordinate Attention for Efficient Mobile Network Design [J].

Hou, Qibin ;

Zhou, Daquan ;

Feng, Jiashi .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13708-13717

[10]

Howard A.G., 2017, MOBILENETS EFFICIENT

← 1 2 3 4 →