Truncation Cross Entropy Loss for Remote Sensing Image Captioning

被引:75
作者
Li, Xuelong [1 ,2 ]
Zhang, Xueting [1 ,2 ]
Huang, Wei [1 ,2 ]
Wang, Qi [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Ctr OPT IMagery Anal & Learning OPTIMAL, Xian 710072, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2021年 / 59卷 / 06期
基金
中国国家自然科学基金;
关键词
Feature extraction; Remote sensing; Entropy; Semantics; Decoding; Optimization; Visualization; Image captioning; overfitting; remote sensing; truncation cross entropy (TCE) loss; ATTENTION;
D O I
10.1109/TGRS.2020.3010106
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Recently, remote sensing image captioning (RSIC) has drawn an increasing attention. In this field, the encoder-decoder-based methods have become the mainstream due to their excellent performance. In the encoder-decoder framework, the convolutional neural network (CNN) is used to encode a remote sensing image into a semantic feature vector, and a sequence model such as long short-term memory (LSTM) is subsequently adopted to generate a content-related caption based on the feature vector. During the traditional training stage, the probability of the target word at each time step is forcibly optimized to 1 by the cross entropy (CE) loss. However, because of the variability and ambiguity of possible image captions, the target word could be replaced by other words like its synonyms, and therefore, such an optimization strategy would result in the overfitting of the network. In this article, we explore the overfitting phenomenon in the RSIC caused by CE loss and correspondingly propose a new truncation cross entropy (TCE) loss, aiming to alleviate the overfitting problem. In order to verify the effectiveness of the proposed approach, extensive comparison experiments are performed on three public RSIC data sets, including UCM-captions, Sydney-captions, and RSICD. The state-of-the-art result of Sydney-captions and RSICD and the competitive results of UCM-captions achieved by TCE loss demonstrate that the proposed method is beneficial to RSIC.
引用
收藏
页码:5246 / 5257
页数:12
相关论文
共 47 条
[1]  
[Anonymous], 2014, Comput. Sci.
[2]   Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs [J].
Chen, Shizhe ;
Jin, Qin ;
Wang, Peng ;
Wu, Qi .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9959-9968
[3]   Convolutional Neural Network for Remote-Sensing Scene Classification: Transfer Learning Analysis [J].
de Lima, Rafael Pires ;
Marfurt, Kurt .
REMOTE SENSING, 2020, 12 (01)
[4]  
Denkowski M., 2014, P 9 WORKSH STAT MACH, P376
[5]  
Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
[6]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[7]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[8]   Densely Connected Convolutional Networks [J].
Huang, Gao ;
Liu, Zhuang ;
van der Maaten, Laurens ;
Weinberger, Kilian Q. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269
[9]   Denoising-Based Multiscale Feature Fusion for Remote Sensing Image Captioning [J].
Huang, Wei ;
Wang, Qi ;
Li, Xuelong .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (03) :436-440
[10]  
Huang W, 2019, INT GEOSCI REMOTE SE, P3017, DOI [10.1109/IGARSS.2019.8898875, 10.1109/igarss.2019.8898875]