Towards controllable image descriptions with semi-supervised VAE

被引：3

作者：

Zakharov, Nikolai ^{[1
]}

Su, Hang ^{[1
]}

Zhu, Jun ^{[1
]}

Glaescher, Jan ^{[2
]}

机构：

[1] Tsinghua Univ, Inst Al THBI Lab, State Key Lab Intell Tech & Sys, Dept Comp Sci & Tech,BNRist Ctr, Beijing 100084, Peoples R China

[2] Univ Med Ctr Hamburg Eppendorf, Inst Syst Neurosci, Hamburg, Germany

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2019年 / 63卷

基金：

北京市自然科学基金;

关键词：

VAE; Image caption; Generative models; Semi-supervised;

D O I：

10.1016/j.jvcir.2019.102574

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image captioning models successfully describe the visual contents of images using natural language. To generate more natural and diverse descriptions, a model must learn style-specific patterns and requires collecting style-specific datasets, which is time-consuming. To address this issue, we propose a semi-supervised deep generative model, Semi-supervised Conditional Variational Auto-Encoder (SCVAE). Our model is capable of leveraging more labelled and unlabelled data in the generative model schema. Extensive empirical results demonstrate that compared with the start-of-art models, our proposed method is able to generate more accurate image captions with more extensive styles. (C) 2019 Elsevier Inc. All rights reserved.

引用

页数：8