YOLO-SSP: an object detection model based on pyramid spatial attention and improved downsampling strategy for remote sensing images

被引:10
作者
Liu, Yongli [1 ]
Yang, Degang [1 ,2 ]
Song, Tingting [1 ]
Ye, Yichen [3 ]
Zhang, Xin [1 ]
机构
[1] Chongqing Normal Univ, Coll Comp & Informat Sci, Chongqing 401331, Peoples R China
[2] Chongqing Engn Res Ctr, Educ Big Data Intelligent Percept & Applicat, Chongqing 401331, Peoples R China
[3] Southwest Univ, Coll Elect & Informat Engn, Chongqing 400715, Peoples R China
关键词
Object detection; Remote sensing images; Small object; Attention mechanism; CONVOLUTIONAL NETWORKS;
D O I
10.1007/s00371-024-03434-y
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Object detection is an essential task in remote sensing image processing. However, the remote sensing images are characterized by large range of object sizes and complex object backgrounds, which results in challenges in the object detection task. Moreover, the detection effect of existing object detectors on remote sensing images is still not satisfactory. In order to tackle the above problems, an object detection model named YOLO-SSP for remote sensing images is proposed based on the YOLOv8m model in this paper. To begin with, the original downsampling layers are replaced with the proposed lightweight SPD-Conv module, which performs downsampling without loss of fine-grained information and improves the ability of the network to learn the feature representation. In addition, to adapt the large number of small objects in remote sensing images, a small object detection layer is added and achieves the expected results. Finally, a pyramid spatial attention mechanism is proposed to obtain the weights of different spatial positions through hierarchical pooling operations. It effectively improves the detection performance of small objects and those with complex backgrounds. We conducted ablation experiments on the DIOR dataset and compared the YOLO-SSP model with other state-of-the-art models. YOLO-SSP obtains 64.7% of mAP, which is an improvement of 2.3% relative to the baseline model. To demonstrate the generalizability and robustness of the improved model, the comparison experiments are also performed on the TGRS-HRRSD dataset and SIMD dataset with mAP of 77.2 and 64.9%, respectively. The code will be available at https://github.com/YongliLiu/SSP.
引用
收藏
页码:1467 / 1484
页数:18
相关论文
共 44 条
[11]   ConvUNeXt: An efficient convolution neural network for medical image segmentation [J].
Han, Zhimeng ;
Jian, Muwei ;
Wang, Gai-Ge .
KNOWLEDGE-BASED SYSTEMS, 2022, 253
[12]   Multisized Object Detection Using Spaceborne Optical Imagery [J].
Haroon, Muhammad ;
Shahzad, Muhammad ;
Fraz, Muhammad Moazam .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2020, 13 :3032-3046
[13]  
He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
[14]  
He KM, 2014, LECT NOTES COMPUT SC, V8691, P346, DOI [arXiv:1406.4729, 10.1007/978-3-319-10578-9_23]
[15]   Coordinate Attention for Efficient Mobile Network Design [J].
Hou, Qibin ;
Zhou, Daquan ;
Feng, Jiashi .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13708-13717
[16]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
[17]   Efficient-Lightweight YOLO: Improving Small Object Detection in YOLO for Aerial Images [J].
Hu, Mengzi ;
Li, Ziyang ;
Yu, Jiong ;
Wan, Xueqiang ;
Tan, Haotian ;
Lin, Zeyu .
SENSORS, 2023, 23 (14)
[18]   SS R-CNN: Self-Supervised Learning Improving Mask R-CNN for Ship Detection in Remote Sensing Images [J].
Jian, Ling ;
Pu, Zhiqi ;
Zhu, Lili ;
Yao, Tiancan ;
Liang, Xijun .
REMOTE SENSING, 2022, 14 (17)
[19]   CRNet: Context-guided Reasoning Network for Detecting Hard Objects [J].
Leng, Jiaxu ;
Liu, Yiran ;
Gao, Xinbo ;
Wang, Zhihui .
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 :3765-3777
[20]   Pareto Refocusing for Drone-View Object Detection [J].
Leng, Jiaxu ;
Mo, Mengjingcheng ;
Zhou, Yinghua ;
Gao, Chenqiang ;
Li, Weisheng ;
Gao, Xinbo .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) :1320-1334