Image Captioning with Generative Adversarial Network

被引：10

作者：

Amirian, Soheyla ^{[1
]}

Rasheed, Khaled ^{[1
,2
]}

Taha, Thiab R. ^{[1
]}

Arabnia, Hamid R. ^{[1
]}

机构：

[1] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA

[2] Univ Georgia, Inst Artificial Intelligence, Athens, GA 30602 USA

来源：

2019 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2019) | 2019年

关键词：

Deep Learning; Generative Adversarial Network; Image Captioning;

D O I：

10.1109/CSCI49370.2019.00055

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic image annotation, automatic image tagging, and image linguistic indexing functions use methodologies that significantly overlap. In this paper, we use the general term, image captioning, to refer to all forms of such functions. Image captioning is the process of automatically generating metadata in the form of captioning (i.e., generating sentences that describe the content of the image). Image captioning is used in image retrieval systems to locate images of interest from a database, web, or personal devices. In recent years, investigators have been using Deep Learning to caption images with some success. However, the reported results do suffer from a number of deficiencies; namely, accuracy, lack of diversity and emotions in resultant captions. In order to address some of these deficiencies, we propose to use Generative Adversarial models to produce new and combinatorial samples. More specifically, we propose to explore various autoencoders to generate more accurate and meaningful captions for images. Autoencoders are neural networks that learn data codings in an unsupervised manner. The research outlined in this paper is an ongoing investigative research project.

引用

页码：272 / 275

页数：4

共 17 条

[1]

Amirian Soheyla, 2018, Proceedings of International Conference on Computational Science and Computational Intelligence (CSCI 2018: December 2018, USA)

[2]

"Artificial Intelligence" Research Track (CSCI-ISAI), P1132

[3]

Amirian Soheyla, 2019, The 23rd International Conference on Image Processing, Computer Vision and Pattern Recognition (IPCV'19), World Congress in Computer Science, Computer Engineering and Applied Computing (CSCE19), ', P10

[4]

[Anonymous], 2015, CoRR abs/1511.05644

[5]

[Anonymous], 2017, ARXIV170107875

[6]

[Anonymous], 2017, ADV NEURAL INFORM PR, DOI DOI 10.1109/ICCV.2017.323

[7] Towards Diverse and Natural Image Descriptions via a Conditional GAN [J].

Dai, Bo ;

Fidler, Sanja ;

Urtasun, Raquel ;

Lin, Dahua .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2989-2998

[8]

Diop R, 2011, BIOL MED PHYS BIOMED, P227, DOI 10.1007/978-1-4419-7835-6_10

[9] Every Picture Tells a Story: Generating Sentences from Images [J].

Farhadi, Ali ;

Hejrati, Mohsen ;

Sadeghi, Mohammad Amin ;

Young, Peter ;

Rashtchian, Cyrus ;

Hockenmaier, Julia ;

Forsyth, David .

COMPUTER VISION-ECCV 2010, PT IV, 2010, 6314 :15-+

[10]

Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672

← 1 2 →