Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery

被引:117
作者
Fang Qingyun [1 ]
Wang Zhaokui [1 ]
机构
[1] Tsinghua Univ, Sch Aerosp Engn, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modality; Attention; Feature fusion; Object detection; Multispectral remote sensing imagery; NETWORK;
D O I
10.1016/j.patcog.2022.108786
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modality fusing complementary information of multispectral remote sensing image pairs can improve the perception ability of detection algorithms, making them more robust and reliable for a wider range of applications, such as nighttime detection. Compared with prior methods, we think different features should be processed specifically, the modality-specific features should be retained and enhanced, while the modality-shared features should be cherry-picked from the RGB and thermal IR modalities. Following this idea, a novel and lightweight multispectral feature fusion approach with joint common-modality and differential-modality attentions are proposed, named Cross-Modality Attentive Feature Fusion (CMAFF). Given the intermediate feature maps of RGB and thermal images, our module parallel infers attention maps from two separate modalities, common- and differential-modality, then the attention maps are multiplied to the input feature map respectively for adaptive feature enhancement or selection. Extensive experiments demonstrate that our proposed approach can achieve the state-of-the-art performance at a low computation cost. (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 57 条
[1]  
[Anonymous], 2014, C COMP VIS PATT REC, P580
[2]   Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments [J].
Bai, Xiao ;
Wang, Xiang ;
Liu, Xianglong ;
Liu, Qiang ;
Song, Jingkuan ;
Sebe, Nicu ;
Kim, Been .
PATTERN RECOGNITION, 2021, 120
[3]  
Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934
[4]  
Cao Z., 2021, SENSORS-BASEL, V21
[5]  
Chen YT, 2022, Arxiv, DOI arXiv:2104.02904
[6]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[7]   Vehicle Detection from Multi-modal Aerial Imagery using YOLOv3 with Mid-level Fusion [J].
Dhanaraj, Mayur ;
Sharma, Manish ;
Sarkar, Tiyasa ;
Karnam, Srivallabha ;
Chachlakis, Dimitris ;
Ptucha, Raymond ;
Markopoulos, Panos P. ;
Saber, Eli .
BIG DATA II: LEARNING, ANALYTICS, AND APPLICATIONS, 2020, 11395
[8]   Learning RoI Transformer for Oriented Object Detection in Aerial Images [J].
Ding, Jian ;
Xue, Nan ;
Long, Yang ;
Xia, Gui-Song ;
Lu, Qikai .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2844-2853
[9]   Fast Feature Pyramids for Object Detection [J].
Dollar, Piotr ;
Appel, Ron ;
Belongie, Serge ;
Perona, Pietro .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (08) :1532-1545
[10]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338