RMT-YOLOv9s: An Infrared Small Target Detection Method Based on UAV Remote Sensing Images

被引：9

作者：

Xu, Keyu ^{[1
]}

Song, Chengtian ^{[1
,2
]}

Xie, Yue ^{[2
]}

Pan, Lizhi ^{[1
]}

Gan, Xiaozheng ^{[1
]}

Huang, Gao ^{[2
,3
]}

机构：

[1] Beijing Inst Technol, Sch Elect & Mech, Beijing 100081, Peoples R China

[2] Sci & Technol Electromech Dynam Control Lab, Xian 710065, Peoples R China

[3] Beijing Univ Technol, Sch Informat Sci & Technol, Beijing 100124, Peoples R China

来源：

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS | 2024年 / 21卷

关键词：

Feature extraction; Object detection; Neck; Transformers; Accuracy; Head; Computer vision; Vehicle dynamics; Semantics; Remote sensing; Dysample; efficient multiscale attention (EMA); retentive networks meet vision transformer (RMT) transformer; unmanned aerial vehicle (UAV) infrared target detection; YOLOv9;

D O I：

10.1109/LGRS.2024.3484748

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Unmanned aerial vehicles (UAVs) and infrared imaging technology have numerous applications in civilian fields. To address the issues of low accuracy resulting from complex ground backgrounds, small target size, and limited target features in UAV remote sensing infrared image target detection, we use the YOLOv9s model and the latest retentive networks meet vision transformers (RMTs) technology and propose the RMT-YOLOv9s model for infrared small target detection. First, a convolutional neural network (CNN)-RMT-based backbone is proposed by incorporating the RMT model into the backbone network of YOLOv9s, which extracts both local and global features for small target detection. Then, an improved neck multiscale feature-fusion network RMTELAN-PANet is designed using the novel convolutional RMTELAN module proposed in this letter, which can better capture and use semantic information from feature maps. Finally, efficient multiscale attention (EMA) attention module and upsampling Dysample module are integrated into RMTELAN-PANet to further improve the feature information of small targets. Experiments on the HIT-UAV dataset show that RMT-YOLOv9s outperforms other popular methods in infrared small target detection.

引用

页数：5

共 23 条

[1] Derivative Entropy-Based Contrast Measure for Infrared Small-Target Detection [J].

Bai, Xiangzhi ;

Bi, Yanguang .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (04) :2452-2466

[2] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[3] RMT: Retentive Networks Meet Vision Transformers [J].

Fan, Qihang ;

Huang, Huaibo ;

Chen, Mingrui ;

Liu, Hongmin ;

He, Ran .

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, :5641-5651

[4] Rich feature hierarchies for accurate object detection and semantic segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587

[5]

Liu RT, 2022, CYBORG BIONIC SYST, V2022, DOI [10.34133/2022/9780569, 10.1109/TAI.2022.3214486]

[6] SSD: Single Shot MultiBox Detector [J].

Liu, Wei ;

Anguelov, Dragomir ;

Erhan, Dumitru ;

Szegedy, Christian ;

Reed, Scott ;

Fu, Cheng-Yang ;

Berg, Alexander C. .

COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :21-37

[7] Learning to Upsample by Learning to Sample [J].

Liu, Wenze ;

Lu, Hao ;

Fu, Hongtao ;

Cao, Zhiguo .

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, :6004-6014

[8]

Ouyang Daliang, 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P1, DOI 10.1109/ICASSP49357.2023.10096516

[9] AIMED-Net: An Enhancing Infrared Small Target Detection Net in UAVs with Multi-Layer Feature Enhancement for Edge Computing [J].

Pan, Lehao ;

Liu, Tong ;

Cheng, Jianghua ;

Cheng, Bang ;

Cai, Yahui .

REMOTE SENSING, 2024, 16 (10)

[10] You Only Look Once: Unified, Real-Time Object Detection [J].

Redmon, Joseph ;

Divvala, Santosh ;

Girshick, Ross ;

Farhadi, Ali .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :779-788

← 1 2 3 →