Improve Image Captioning by Self-attention

被引：4

作者：

Li, Zhenru ^{[1
]}

Li, Yaoyi ^{[1
]}

Lu, Hongtao ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China

来源：

NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V | 2019年 / 1143卷

关键词：

Image captioning; Self-attention;

D O I：

10.1007/978-3-030-36802-9_11

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The common attention mechanism has been widely adopted in prevalent image captioning frameworks. In most of the prior work, attention weights were only determined by visual features as well as the hidden states of Recurrent Neural Network (RNN), while the interaction of visual features was not modelled. In this paper, we introduce the self-attention into the current image captioning framework to leverage the nonlocal correlation among visual features. Moreover, we propose three distinctive methods to fuse the self-attention and the conventional attention mechanism. Extensive experiments on MSCOCO dataset show that the self-attention can empower the captioning model to achieve competitive performance with the state-of-the-art methods.

引用

页码：91 / 98

页数：8

共 24 条

[1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Anderson, Peter
He, Xiaodong
Buehler, Chris
Teney, Damien
Johnson, Mark
Gould, Stephen
Zhang, Lei
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
[2] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[3] Chen Cheng, 2019, AAAI
[4] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
Chen, Long
Zhang, Hanwang
Xiao, Jun
Nie, Liqiang
Shao, Jian
Liu, Wei
Chua, Tat-Seng
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6298 - 6306
[5] Gao J.Y., 2019, CVPR
[6] Gu JX, 2018, AAAI CONF ARTIF INTE, P6837
[7] Recurrent Fusion Network for Image Captioning
Jiang, Wenhao
Ma, Lin
Jiang, Yu-Gang
Liu, Wei
Zhang, Tong
[J]. COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 510 - 526
[8] Karpathy A, 2015, PROC CVPR IEEE, P3128, DOI 10.1109/CVPR.2015.7298932
[9] Li LH, 2017, AAAI CONF ARTIF INTE, P4133
[10] Microsoft COCO: Common Objects in Context
Lin, Tsung-Yi
Maire, Michael
Belongie, Serge
Hays, James
Perona, Pietro
Ramanan, Deva
Dollar, Piotr
Zitnick, C. Lawrence
[J]. COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 : 740 - 755

← 1 2 3 →