Boosted Attention: Leveraging Human Attention for Image Captioning

被引:33
作者
Chen, Shi [1 ]
Zhao, Qi [1 ]
机构
[1] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
来源
COMPUTER VISION - ECCV 2018, PT XI | 2018年 / 11215卷
基金
美国国家科学基金会;
关键词
Image captioning; Visual attention; Human attention;
D O I
10.1007/978-3-030-01252-6_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual attention has shown usefulness in image captioning, with the goal of enabling a caption model to selectively focus on regions of interest. Existing models typically rely on top-down language information and learn attention implicitly by optimizing the captioning objectives. While somewhat effective, the learned top-down attention can fail to focus on correct regions of interest without direct supervision of attention. Inspired by the human visual system which is driven by not only the task-specific top-down signals but also the visual stimuli, we in this work propose to use both types of attention for image captioning. In particular, we highlight the complementary nature of the two types of attention and develop a model (Boosted Attention) to integrate them for image captioning. We validate the proposed approach with state-of-the-art performance across various evaluation metrics.
引用
收藏
页码:72 / 88
页数:17
相关论文
共 35 条
[1]  
[Anonymous], 2011, P 24 CVPR
[2]  
[Anonymous], 2007, P 2 WORKSHOP STAT MA
[3]  
[Anonymous], 2017, CoRR
[4]  
[Anonymous], 2011, P 15 C COMP NAT LANG
[5]  
[Anonymous], 2016, Lecture Notes in Computer Science, DOI [10.1007/978-3-319-46493-0_38, DOI 10.1007/978-3-319-46493-0_38]
[6]  
[Anonymous], 2015, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2015.7298710
[7]  
[Anonymous], 2017, PROCEEDINGS OF THE I
[8]  
Bengio S., 2015, NEURIPS, P1171, DOI DOI 10.5555/2969239.2969370
[9]   SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
Nie, Liqiang ;
Shao, Jian ;
Liu, Wei ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306
[10]  
Cornia Marcella, 2017, 2017 IEEE International Conference on Multimedia and Expo: Workshops (ICMEW), P309, DOI 10.1109/ICMEW.2017.8026277