Scene Attention Mechanism for Remote Sensing Image Caption Generation

被引:30
作者
Wu, Shiqi [1 ]
Zhang, Xiangrong [1 ]
Wang, Xin [1 ]
Li, Chen [2 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Minist Educ, Key Lab Intelligent Percept & Image Understanding, Xian 710071, Peoples R China
[2] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Peoples R China
来源
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2020年
基金
中国国家自然科学基金;
关键词
remote sensing image captioning; convolutional neural network; long short-term memory network; scene attention mechanism;
D O I
10.1109/ijcnn48605.2020.9207381
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Remote sensing images play an important role in various applications. To make it easier for humans to understand remote sensing images, the task of remote sensing image captioning attracts more and more researchers' attention. Inspired from the way human receives visual information, attention mechanism has been widely used in remote sensing image understanding. To catch more scene information and improve the stability of the generated sentences, a new attention mechanism called scene attention is proposed. Except for the current attention via the current hidden state of the long short-term memory network (LSTM), our proposed method simultaneously explores the global visual information from the mean feature of all convolutional features. The effectiveness of the proposed method is evaluated on UCM-captions, Sydney-captions and RSICD datasets. The results of our experiment show that comparing with some other captioning methods, our method is more stable and obtains a better performance.
引用
收藏
页数:7
相关论文
共 21 条
  • [1] Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
    Anderson, Peter
    Wu, Qi
    Teney, Damien
    Bruce, Jake
    Johnson, Mark
    Sunderhauf, Niko
    Reid, Ian
    Gould, Stephen
    van den Hengel, Anton
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3674 - 3683
  • [2] [Anonymous], 2015, arXiv:1504.00325
  • [3] Bo Qu, 2016, 2016 INT C COMP INF
  • [4] Chen L, 2017, IEEE C COMP VIS PATT
  • [5] Gers F. A., 2001, Long short-term memory in recurrent neural networks, DOI DOI 10.5075/EPFL-THESIS-2366
  • [6] Li S., 2011, P 15 C COMP NAT LANG, P220
  • [7] Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
    Lu, Jiasen
    Xiong, Caiming
    Parikh, Devi
    Socher, Richard
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3242 - 3250
  • [8] Exploring Models and Data for Remote Sensing Image Caption Generation
    Lu, Xiaoqiang
    Wang, Binqiang
    Zheng, Xiangtao
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (04): : 2183 - 2195
  • [9] Mao Junhua., 2014, Explain images with multimodal recurrent neural networks
  • [10] Ordonez V., 2011, Advances in Neural Information Processing Systems, V24, P1143