Improvement of Embedding Channel-Wise Activation in Soft-Attention Neural Image Captioning

被引:0
作者
Li, Yanke [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Appl Math, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON VISION, IMAGE AND SIGNAL PROCESSING (ICVISP 2018) | 2018年
关键词
Scene Understanding; Image Captioning; Deep learning; Soft Attention;
D O I
10.1145/3271553.3271592
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The paper dives into the topic of image captioning with the soft attention algorithm. We first review relevant works on the captioned topic in terms of background introduction and then explains the original model in details. On top of the plain soft attention model, we propose two approaches for further improvements: SE attention model which adds an extra channel-wise activation layer, and bi-directional attention model that explores two-way attention order feasibility. We implement both methods under limited experiment conditions and in addition swap the original encoder with state-of-art structure. Quantitative results and example demonstrations show that our proposed methods have achieved better performance than baselines. In the end, some suggestions of future work on top of proposed are summarized for a purpose of completeness.
引用
收藏
页数:9
相关论文
共 18 条
[1]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[2]  
Banerjee S, 2005, P ACL 2005 WORKSHOP, P65
[3]   Learning Where to Attend with Deep Architectures for Image Tracking [J].
Denil, Misha ;
Bazzani, Loris ;
Larochelle, Hugo ;
de Freitas, Nando .
NEURAL COMPUTATION, 2012, 24 (08) :2151-2184
[4]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[5]   Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics [J].
Hodosh, Micah ;
Young, Peter ;
Hockenmaier, Julia .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 47 :853-899
[6]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
[7]   A model of saliency-based visual attention for rapid scene analysis [J].
Itti, L ;
Koch, C ;
Niebur, E .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (11) :1254-1259
[8]   DenseCap: Fully Convolutional Localization Networks for Dense Captioning [J].
Johnson, Justin ;
Karpathy, Andrej ;
Fei-Fei, Li .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4565-4574
[9]  
Karpathy A, 2015, PROC CVPR IEEE, P3128, DOI 10.1109/CVPR.2015.7298932
[10]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755