A Novel Image Captioning Method Based on Generative Adversarial Networks

被引：1

作者：

Fan, Yang ^{[1
]}

Xu, Jungang ^{[1
]}

Sun, Yingfei ^{[1
]}

Wang, Yiyu ^{[1
]}

机构：

[1] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV | 2019年 / 11730卷

基金：

北京市自然科学基金;

关键词：

LSTM; GAN; Generator; Discriminator; Matcher;

D O I：

10.1007/978-3-030-30490-4_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although the image captioning methods based on RNN has made great progress in recent years, these are often lacking in variability and ignore some minor information. In this paper, a novel image captioning method based on Generative Adversarial Networks is proposed, which improve the naturalness and diversity of image description. In the method, matcher is added to the generator to get the feature of the image that does not appear in the standard description, then to produce descriptions conditioned on image, and discriminator to access how well a description fits the visual content. It is noteworthy that training a sequence generator is nontrivial. Experiments on MSCOCO and Flickr30k show that it performed competitively against real people in our user study and outperformed other methods on various tasks.

引用

页码：281 / 292

页数：12

共 31 条

[1]

Biswas P, 2005, I CONF VLSI DESIGN, P651

[2]

Chao Y.-W., 2015, P ICCV, P4259

[3] Towards Diverse and Natural Image Descriptions via a Conditional GAN [J].

Dai, Bo ;

Fidler, Sanja ;

Urtasun, Raquel ;

Lin, Dahua .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2989-2998

[4]

Diop R, 2011, BIOL MED PHYS BIOMED, P227, DOI 10.1007/978-1-4419-7835-6_10

[5]

Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878

[6] Long-term Recurrent Merge Network Model for Image Captioning [J].

Fan, Yang ;

Xu, Jungang ;

Sun, Yingfei ;

He, Ben .

2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, :254-259

[7] Every Picture Tells a Story: Generating Sentences from Images [J].

Farhadi, Ali ;

Hejrati, Mohsen ;

Sadeghi, Mohammad Amin ;

Young, Peter ;

Rashtchian, Cyrus ;

Hockenmaier, Julia ;

Forsyth, David .

COMPUTER VISION-ECCV 2010, PT IV, 2010, 6314 :15-+

[8] Learning Attributes Equals Multi-Source Domain Generalization [J].

Gan, Chuang ;

Yang, Tianbao ;

Gong, Boqing .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :87-97

[9]

Girshick R., 2014, IEEE COMP SOC C COMP, DOI [10.1109/CVPR.2014.81, DOI 10.1109/CVPR.2014.81]

[10]

Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672

← 1 2 3 4 →