Towards Generating Stylized Image Captions via Adversarial Training

被引:15
作者
Nezami, Omid Mohamad [1 ,2 ]
Dras, Mark [1 ]
Wan, Stephen [2 ]
Paris, Cecile [1 ,2 ]
Hamey, Len [1 ]
机构
[1] Macquarie Univ, Sydney, NSW, Australia
[2] CSIRO, Data61, Sydney, NSW, Australia
来源
PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I | 2019年 / 11670卷
关键词
Image captioning; Attention mechanism; Adversarial training;
D O I
10.1007/978-3-030-29908-8_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While most image captioning aims to generate objective descriptions of images, the last few years have seen work on generating visually grounded image captions which have a specific style (e.g., incorporating positive or negative sentiment). However, because the stylistic component is typically the last part of training, current models usually pay more attention to the style at the expense of accurate content description. In addition, there is a lack of variability in terms of the stylistic aspects. To address these issues, we propose an image captioning model called ATTEND-GAN which has two core components: first, an attention-based caption generator to strongly correlate different parts of an image with different parts of a caption; and second, an adversarial training mechanism to assist the caption generator to add diverse stylistic components to the generated captions. Because of these components, ATTEND-GAN can generate correlated captions as well as more human-like variability of stylistic patterns. Our system outperforms the state-of-the-art as well as a collection of our baseline models. A linguistic analysis of the generated captions demonstrates that captions generated using ATTEND-GAN have a wider range of stylistic adjectives and adjective-noun pairs.
引用
收藏
页码:270 / 284
页数:15
相关论文
共 31 条
[1]  
ANDERSON P, 2018, CVPR, V3, P6, DOI [10.1109/CVPR.2018.00636, DOI 10.1109/CVPR.2018.00636]
[2]   SPICE: Semantic Propositional Image Caption Evaluation [J].
Anderson, Peter ;
Fernando, Basura ;
Johnson, Mark ;
Gould, Stephen .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :382-398
[3]  
[Anonymous], 2017, ARXIV170107875
[4]   SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
Nie, Liqiang ;
Shao, Jian ;
Liu, Wei ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306
[5]  
Chen T, 2018, ARXIV180703871
[6]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7]  
Denkowski M, 2014, P 9 WORKSH STAT MACH, P376
[8]   StyleNet: Generating Attractive Visual Captions with Styles [J].
Gan, Chuang ;
Gan, Zhe ;
He, Xiaodong ;
Gao, Jianfeng ;
Deng, Li .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :955-964
[9]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[10]  
He K., 2016, CVPR, DOI [10.1109/CVPR.2016.90, DOI 10.1109/CVPR.2016.90]