Improve Image Captioning by Self-attention

被引:4
作者
Li, Zhenru [1 ]
Li, Yaoyi [1 ]
Lu, Hongtao [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
来源
NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V | 2019年 / 1143卷
关键词
Image captioning; Self-attention;
D O I
10.1007/978-3-030-36802-9_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The common attention mechanism has been widely adopted in prevalent image captioning frameworks. In most of the prior work, attention weights were only determined by visual features as well as the hidden states of Recurrent Neural Network (RNN), while the interaction of visual features was not modelled. In this paper, we introduce the self-attention into the current image captioning framework to leverage the nonlocal correlation among visual features. Moreover, we propose three distinctive methods to fuse the self-attention and the conventional attention mechanism. Extensive experiments on MSCOCO dataset show that the self-attention can empower the captioning model to achieve competitive performance with the state-of-the-art methods.
引用
收藏
页码:91 / 98
页数:8
相关论文
共 24 条
  • [1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
    Anderson, Peter
    He, Xiaodong
    Buehler, Chris
    Teney, Damien
    Johnson, Mark
    Gould, Stephen
    Zhang, Lei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
  • [2] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [3] Chen Cheng, 2019, AAAI
  • [4] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
    Chen, Long
    Zhang, Hanwang
    Xiao, Jun
    Nie, Liqiang
    Shao, Jian
    Liu, Wei
    Chua, Tat-Seng
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6298 - 6306
  • [5] Gao J.Y., 2019, CVPR
  • [6] Gu JX, 2018, AAAI CONF ARTIF INTE, P6837
  • [7] Recurrent Fusion Network for Image Captioning
    Jiang, Wenhao
    Ma, Lin
    Jiang, Yu-Gang
    Liu, Wei
    Zhang, Tong
    [J]. COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 510 - 526
  • [8] Karpathy A, 2015, PROC CVPR IEEE, P3128, DOI 10.1109/CVPR.2015.7298932
  • [9] Li LH, 2017, AAAI CONF ARTIF INTE, P4133
  • [10] Microsoft COCO: Common Objects in Context
    Lin, Tsung-Yi
    Maire, Michael
    Belongie, Serge
    Hays, James
    Perona, Pietro
    Ramanan, Deva
    Dollar, Piotr
    Zitnick, C. Lawrence
    [J]. COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 : 740 - 755