Multispectral Object Detection Based on Multilevel Feature Fusion and Dual Feature Modulation

被引：4

作者：

Sun, Jin ^{[1
]}

Yin, Mingfeng ^{[1
]}

Wang, Zhiwei ^{[1
]}

Xie, Tao ^{[1
]}

Bei, Shaoyi ^{[1
]}

机构：

[1] Jiangsu Univ Technol, Sch Automobile & Traff Engn, Changzhou 213001, Peoples R China

来源：

ELECTRONICS | 2024年 / 13卷 / 02期

基金：

中国国家自然科学基金;

关键词：

multispectral object detection; remote sensing; visible-infrared images; multilevel feature fusion; dual feature modulation; FASTER R-CNN; NETWORK;

D O I：

10.3390/electronics13020443

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multispectral object detection is a crucial technology in remote sensing image processing, particularly in low-light environments. Most current methods extract features at a single scale, resulting in the fusion of invalid features and the failure to detect small objects. To address these issues, we propose a multispectral object detection network based on multilevel feature fusion and dual feature modulation (GMD-YOLO). Firstly, a novel dual-channel CSPDarknet53 network is used to extract deep features from visible-infrared images. This network incorporates a Ghost module, which generates additional feature maps through a series of linear operations, achieving a balance between accuracy and speed. Secondly, the multilevel feature fusion (MLF) module is designed to utilize cross-modal information through the construction of hierarchical residual connections. This approach strengthens the complementarity between different modalities, allowing the network to improve multiscale representation capabilities at a more refined granularity level. Finally, a dual feature modulation (DFM) decoupling head is introduced to enhance small object detection. This decoupled head effectively meets the distinct requirements of classification and localization tasks. GMD-YOLO is validated on three public visible-infrared datasets: DroneVehicle, KAIST, and LLVIP. DroneVehicle and LLVIP achieved mAP@0.5 of 78.0% and 98.0%, outperforming baseline methods by 3.6% and 4.4%, respectively. KAIST exhibited an MR of 7.73% with an FPS of 61.7. Experimental results demonstrated that our method surpasses existing advanced methods and exhibits strong robustness.

引用

页数：18

共 53 条

[1] Effectiveness Guided Cross-Modal Information Sharing for Aligned RGB-T Object Detection
An, Zijia
Liu, Chunlei
Han, Yuqi
[J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2562 - 2566
[2] Dual-YOLO Architecture from Infrared and Visible Images for Object Detection
Bao, Chun
Cao, Jie
Hao, Qun
Cheng, Yang
Ning, Yaqian
Zhao, Tianhua
[J]. SENSORS, 2023, 23 (06)
[3] Biswas M, 2023, Arxiv, DOI [arXiv:2308.06983, 10.48550/ARXIV.2308.069832308.06983]
[4] Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934
[5] Disentangle Your Dense Object Detector
Chen, Zehui
Yang, Chenhongyi
Li, Qiaofei
Zhao, Feng
Zha, Zheng-Jun
Wu, Feng
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4939 - 4948
[6] KAIST Multi-Spectral Day/Night Data Set for Autonomous and Assisted Driving
Choi, Yukyung
Kim, Namil
Hwang, Soonmin
Park, Kibaek
Yoon, Jae Shin
An, Kyounghwan
Kweon, In So
[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2018, 19 (03) : 934 - 948
[7] Dai JF, 2016, ADV NEUR IN, V29
[8] diaeresis>rg Wagner Jo<spacing, 2016, ESANN
[9] Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery
Fang Qingyun
Wang Zhaokui
[J]. PATTERN RECOGNITION, 2022, 130
[10] Convolutional Two-Stream Network Fusion for Video Action Recognition
Feichtenhofer, Christoph
Pinz, Axel
Zisserman, Andrew
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1933 - 1941

← 1 2 3 4 5 6 →