Towards controllable image descriptions with semi-supervised VAE

被引:3
作者
Zakharov, Nikolai [1 ]
Su, Hang [1 ]
Zhu, Jun [1 ]
Glaescher, Jan [2 ]
机构
[1] Tsinghua Univ, Inst Al THBI Lab, State Key Lab Intell Tech & Sys, Dept Comp Sci & Tech,BNRist Ctr, Beijing 100084, Peoples R China
[2] Univ Med Ctr Hamburg Eppendorf, Inst Syst Neurosci, Hamburg, Germany
基金
北京市自然科学基金;
关键词
VAE; Image caption; Generative models; Semi-supervised;
D O I
10.1016/j.jvcir.2019.102574
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning models successfully describe the visual contents of images using natural language. To generate more natural and diverse descriptions, a model must learn style-specific patterns and requires collecting style-specific datasets, which is time-consuming. To address this issue, we propose a semi-supervised deep generative model, Semi-supervised Conditional Variational Auto-Encoder (SCVAE). Our model is capable of leveraging more labelled and unlabelled data in the generative model schema. Extensive empirical results demonstrate that compared with the start-of-art models, our proposed method is able to generate more accurate image captions with more extensive styles. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页数:8
相关论文
共 39 条
[1]  
[Anonymous], TEXT SUMMAR BRANCH O
[2]  
[Anonymous], 2017, ICCV
[3]  
[Anonymous], 2017, CVPR 2017
[4]  
[Anonymous], 2017, ARXIV170306029
[5]  
[Anonymous], 2016, P C ASS MACH TRANSL
[6]  
[Anonymous], 2015, COMPUTER SCI
[7]  
[Anonymous], 2017, ARXIV170208139
[8]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[9]  
Bahdanau Dzmitry, 2015, 3 INT C LEARN REPR I
[10]  
Bowman S. R., 2015, ARXIV