Deep Learning-Based Short Story Generation for an Image Using the Encoder-Decoder Structure

被引：9

作者：

Min, Kyungbok ^{[1
]}

Dang, Minh ^{[1
]}

Moon, Hyeonjoon ^{[1
]}

机构：

[1] Sejong Univ, Dept Comp Sci & Engn, Seoul 05006, South Korea

来源：

IEEE ACCESS | 2021年 / 9卷

基金：

新加坡国家研究基金会;

关键词：

Visualization; Artificial intelligence; Semantics; Convolutional neural networks; Recurrent neural networks; Predictive models; Linguistics; Image caption; story teller; deep learning; computer vision; context awareness;

D O I：

10.1109/ACCESS.2021.3104276

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Research that applies artificial intelligence (AI) to generate the captions for an image has been extensively studied in recent years. However, the length of these captions was short, and the number of generated captions was limited. In addition, it is unknown whether a short story can be generated based on the image, because many sentences have to be connected to create a fluent short story. As a result, this study introduces an encoder-decoder framework structure to generate a short story captioning (SSCap) using a common image caption dataset and a manually collected story corpus. This manuscript has three main contributions, which include 1) an unsupervised deep learning-based framework that combines a recurrent neural network (RNN) structure and encoder-decoder model for composing a short story for an image, 2) a huge story corpus, which includes two different genres (horror and romantic), manually collected and validated. Extensive experiments demonstrated that short stories created by the proposed model show creative content compared to existing systems that can only make concise sentences. Therefore, the demonstrated framework has the potential to motivate the development of a more robust AI story writer and motivates the integration of the suggested model into practical applications to help the story writers find a new idea.

引用

页码：113550 / 113557

页数：8

共 24 条

[1] Agrawal Harsh, 2016, ARXIV160607493
[2] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Anderson, Peter
He, Xiaodong
Buehler, Chris
Teney, Damien
Johnson, Mark
Gould, Stephen
Zhang, Lei
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
[3] [Anonymous], 2015, P C N AM CHAPTER ASS
[4] Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
Bernardi, Raffaella
Cakici, Ruket
Elliott, Desmond
Erdem, Aykut
Erdem, Erkut
Ikizler-Cinbis, Nazli
Keller, Frank
Muscat, Adrian
Plank, Barbara
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 55 : 409 - 442
[5] Chen X, 2015, PROC CVPR IEEE, P2422, DOI 10.1109/CVPR.2015.7298856
[6] Model-Free Renewable Scenario Generation Using Generative Adversarial Networks
Chen, Yize
Wang, Yishen
Kirschen, Daniel
Zhang, Baosen
[J]. IEEE TRANSACTIONS ON POWER SYSTEMS, 2018, 33 (03) : 3265 - 3275
[7] Face image manipulation detection based on a convolutional neural network
Dang, L. Minh
Hassan, Syed Ibrahim
Im, Suhyeon
Moon, Hyeonjoon
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 129 : 156 - 168
[8] Deep Learning Based Computer Generated Face Identification Using Convolutional Neural Network
Dang, L. Minh
Hassan, Syed Ibrahim
Im, Suhyeon
Lee, Jaecheol
Lee, Sujin
Moon, Hyeonjoon
[J]. APPLIED SCIENCES-BASEL, 2018, 8 (12):
[9] Dollar P., 2015, CoRR
[10] Frome A, 2013, Advances in Neural Information Processing Systems, V26, DOI DOI 10.5555/2999792.2999849

← 1 2 3 →