Boosting Memory with a Persistent Memory Mechanism for Remote Sensing Image Captioning

被引:11
作者
Fu, Kun [1 ,2 ,3 ,4 ,5 ]
Li, Yang [2 ,3 ,4 ]
Zhang, Wenkai [1 ,4 ]
Yu, Hongfeng [1 ,4 ]
Sun, Xian [1 ,3 ,4 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Microelect, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100190, Peoples R China
[4] Chinese Acad Sci, Inst Elect, Key Lab Network Informat Syst Technol, Beijing 100190, Peoples R China
[5] Chinese Acad Sci, Inst Elect, Suzhou 215000, Peoples R China
基金
中国国家自然科学基金;
关键词
image caption; remote sensing; long short-term memory; persistent memory mechanism; NETWORK;
D O I
10.3390/rs12111874
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The encoder-decoder framework has been widely used in the remote sensing image captioning task. When we need to extract remote sensing images containing specific characteristics from the described sentences for research, rich sentences can improve the final extraction results. However, the Long Short-Term Memory (LSTM) network used in decoders still loses some information in the picture over time when the generated caption is long. In this paper, we present a new model component named the Persistent Memory Mechanism (PMM), which can expand the information storage capacity of LSTM with an external memory. The external memory is a memory matrix with a predetermined size. It can store all the hidden layer vectors of LSTM before the current time step. Thus, our method can effectively solve the above problem. At each time step, the PMM searches previous information related to the input information at the current time from the external memory. Then the PMM will process the captured long-term information and predict the next word with the current information. In addition, it updates its memory with the input information. This method can pick up the long-term information missed from the LSTM but useful to the caption generation. By applying this method to image captioning, our CIDEr scores on datasets UCM-Captions, Sydney-Captions, and RSICD increased by 3%, 5%, and 7%, respectively.
引用
收藏
页数:14
相关论文
共 39 条
[1]   SPICE: Semantic Propositional Image Caption Evaluation [J].
Anderson, Peter ;
Fernando, Basura ;
Johnson, Mark ;
Gould, Stephen .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :382-398
[2]  
[Anonymous], P INT C MACH LEARN
[3]  
[Anonymous], 2005, P 2 WORKSH STAT MACH
[4]  
[Anonymous], ARXIV180805864
[5]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[6]  
[Anonymous], 2014, P ANN C NEUR INF PRO, DOI [DOI 10.1021/acs.analchem.7b05329, DOI 10.48550/ARXIV.1409.3215]
[7]  
Bahdanau D., 2014, ABS14090473 CORR
[8]  
Chen Xinlei, 2014, ARXIV14115654
[9]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10]  
Fang H, 2015, PROC CVPR IEEE, P1473, DOI 10.1109/CVPR.2015.7298754