Enhanced multihead self-attention block network for remote sensing image scene classification

被引:1
作者
Li, Yijin [1 ]
Wang, Jiaxin [1 ]
Chen, Sibao [1 ]
Tang, Jin [1 ]
Luo, Bin [1 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Anhui Prov Key Lab Multimodal Cognit Computat, IMIS Lab Anhui Prov,MOE Key Lab ICSP, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
remote sensing scene classification; convolutional neural networks; self-attention; MULTISCALE; SCALE; FEATURES;
D O I
10.1117/1.JRS.17.016517
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Remote sensing image scene classification has been widely researched with the aim of assigning semantics labels to the land cover. Although convolutional neural networks (CNN), such as VggNet and ResNet, have achieved good performance, the complex background and redundant information of remote sensing images restrict the improvement of final accuracy. We propose an enhanced multihead self-attention block network, which effectively reduces the adverse impact of background and emphasize the main information. In this model, due to the possible redundancy of high-level information of CNN, we only replace the final three bottleneck blocks of ResNet50 with the enhanced multihead self-attention layer to focus on the salient region of each image more effectively. Our enhanced multihead self-attention layer provides the following improvements over the classical module. First, we construct a triple-way convolution to deal with the arbitrary directionality of remote sensing images and get more stable attention information. Then, the improved relative position encodings are used to consider the relative distance between different location features. Finally, we use depthwise convolution and InstanceNorm operation to restore the diversity ability of multiheads. The contrast and ablation experiments carried out on three public datasets show our approach improves upon the baseline significantly and achieves remarkable performance compared with some state-of-the-art methods.
引用
收藏
页数:21
相关论文
共 60 条
[1]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[2]   Multi-scale stacking attention pooling for remote sensing scene classification [J].
Bi, Qi ;
Zhang, Han ;
Qin, Kun .
NEUROCOMPUTING, 2021, 436 :147-161
[3]   Self-Attention-Based Deep Feature Fusion for Remote Sensing Scene Classification [J].
Cao, Ran ;
Fang, Leyuan ;
Lu, Ting ;
He, Nanjun .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (01) :43-47
[4]  
Castelluccio M., 2015, Acta Ecol. Sin, V28, P627
[5]   Deep Feature Fusion for VHR Remote Sensing Scene Classification [J].
Chaib, Souleyman ;
Liu, Huan ;
Gu, Yanfeng ;
Yao, Hongxun .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2017, 55 (08) :4775-4784
[6]   Pyramid of Spatial Relatons for Scene-Level Land Use Classification [J].
Chen, Shizhi ;
Tian, YingLi .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2015, 53 (04) :1947-1957
[7]  
Chen XH, 2021, Arxiv, DOI arXiv:2106.02253
[8]   Remote Sensing Image Scene Classification Meets Deep Learning: Challenges, Methods, Benchmarks, and Opportunities [J].
Cheng, Gong ;
Xie, Xingxing ;
Han, Junwei ;
Guo, Lei ;
Xia, Gui-Song .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2020, 13 :3735-3756
[9]   Remote Sensing Image Scene Classification: Benchmark and State of the Art [J].
Cheng, Gong ;
Han, Junwei ;
Lu, Xiaoqiang .
PROCEEDINGS OF THE IEEE, 2017, 105 (10) :1865-1883
[10]   Multi-class geospatial object detection and geographic image classification based on collection of part detectors [J].
Cheng, Gong ;
Han, Junwei ;
Zhou, Peicheng ;
Guo, Lei .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2014, 98 :119-132