Denoising-Based Multiscale Feature Fusion for Remote Sensing Image Captioning

被引:63
作者
Huang, Wei [1 ,2 ]
Wang, Qi [1 ,2 ]
Li, Xuelong [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Ctr Opt Imagery Anal & Learning OPTIMAL, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Remote sensing; Sensors; Noise reduction; Visualization; Atmospheric modeling; Nonhomogeneous media; Deep learning; encoder– decoder; feature fusion; image captioning; multiscale; remote sensing; ATTENTION;
D O I
10.1109/LGRS.2020.2980933
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
With the benefits from deep learning technology, generating captions for remote sensing images has become achievable, and great progress has been made in this field in the recent years. However, a large-scale variation of remote sensing images, which would lead to errors or omissions in feature extraction, still limits the further improvement of caption quality. To address this problem, we propose a denoising-based multi-scale feature fusion (DMSFF) mechanism for remote sensing image captioning in this letter. The proposed DMSFF mechanism aggregates multiscale features with the denoising operation at the stage of visual feature extraction. It can help the encoder-decoder framework, which is widely used in image captioning, to obtain the denoising multiscale feature representation. In experiments, we apply the proposed DMSFF in the encoder-decoder framework and perform the comparative experiments on two public remote sensing image captioning data sets including UC Merced (UCM)-captions and Sydney-captions. The experimental results demonstrate the effectiveness of our method.
引用
收藏
页码:436 / 440
页数:5
相关论文
共 24 条
[1]  
[Anonymous], 2018, AAAI
[2]   SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
Nie, Liqiang ;
Shao, Jian ;
Liu, Wei ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306
[3]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[4]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[5]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
[6]  
Jiang WH, 2018, AAAI CONF ARTIF INTE, P6959
[7]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[8]  
Lin CY, 2004, P WORKSH TEXT SUMM A, P74, DOI DOI 10.1253/JCJ.34.1213
[9]   Sound Active Attention Framework for Remote Sensing Image Captioning [J].
Lu, Xiaoqiang ;
Wang, Binqiang ;
Zheng, Xiangtao .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (03) :1985-2000
[10]   Exploring Models and Data for Remote Sensing Image Caption Generation [J].
Lu, Xiaoqiang ;
Wang, Binqiang ;
Zheng, Xiangtao ;
Li, Xuelong .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (04) :2183-2195