Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery

被引：117

作者：

Fang Qingyun ^{[1
]}

Wang Zhaokui ^{[1
]}

机构：

[1] Tsinghua Univ, Sch Aerosp Engn, Beijing 100084, Peoples R China

来源：

PATTERN RECOGNITION | 2022年 / 130卷

基金：

中国国家自然科学基金;

关键词：

Cross-modality; Attention; Feature fusion; Object detection; Multispectral remote sensing imagery; NETWORK;

D O I：

10.1016/j.patcog.2022.108786

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross-modality fusing complementary information of multispectral remote sensing image pairs can improve the perception ability of detection algorithms, making them more robust and reliable for a wider range of applications, such as nighttime detection. Compared with prior methods, we think different features should be processed specifically, the modality-specific features should be retained and enhanced, while the modality-shared features should be cherry-picked from the RGB and thermal IR modalities. Following this idea, a novel and lightweight multispectral feature fusion approach with joint common-modality and differential-modality attentions are proposed, named Cross-Modality Attentive Feature Fusion (CMAFF). Given the intermediate feature maps of RGB and thermal images, our module parallel infers attention maps from two separate modalities, common- and differential-modality, then the attention maps are multiplied to the input feature map respectively for adaptive feature enhancement or selection. Extensive experiments demonstrate that our proposed approach can achieve the state-of-the-art performance at a low computation cost. (c) 2022 Elsevier Ltd. All rights reserved.

引用

页数：14

共 57 条

[1]

[Anonymous], 2014, C COMP VIS PATT REC, P580

[2] Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments [J].

Bai, Xiao ;

Wang, Xiang ;

Liu, Xianglong ;

Liu, Qiang ;

Song, Jingkuan ;

Sebe, Nicu ;

Kim, Been .

PATTERN RECOGNITION, 2021, 120

[3]

Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934

[4]

Cao Z., 2021, SENSORS-BASEL, V21

[5]

Chen YT, 2022, Arxiv, DOI arXiv:2104.02904

[6] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[7] Vehicle Detection from Multi-modal Aerial Imagery using YOLOv3 with Mid-level Fusion [J].

Dhanaraj, Mayur ;

Sharma, Manish ;

Sarkar, Tiyasa ;

Karnam, Srivallabha ;

Chachlakis, Dimitris ;

Ptucha, Raymond ;

Markopoulos, Panos P. ;

Saber, Eli .

BIG DATA II: LEARNING, ANALYTICS, AND APPLICATIONS, 2020, 11395

[8] Learning RoI Transformer for Oriented Object Detection in Aerial Images [J].

Ding, Jian ;

Xue, Nan ;

Long, Yang ;

Xia, Gui-Song ;

Lu, Qikai .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2844-2853

[9] Fast Feature Pyramids for Object Detection [J].

Dollar, Piotr ;

Appel, Ron ;

Belongie, Serge ;

Perona, Pietro .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (08) :1532-1545

[10] The Pascal Visual Object Classes (VOC) Challenge [J].

Everingham, Mark ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338

← 1 2 3 4 5 6 →