Deep Learning-Based Short Story Generation for an Image Using the Encoder-Decoder Structure

被引:9
作者
Min, Kyungbok [1 ]
Dang, Minh [1 ]
Moon, Hyeonjoon [1 ]
机构
[1] Sejong Univ, Dept Comp Sci & Engn, Seoul 05006, South Korea
基金
新加坡国家研究基金会;
关键词
Visualization; Artificial intelligence; Semantics; Convolutional neural networks; Recurrent neural networks; Predictive models; Linguistics; Image caption; story teller; deep learning; computer vision; context awareness;
D O I
10.1109/ACCESS.2021.3104276
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Research that applies artificial intelligence (AI) to generate the captions for an image has been extensively studied in recent years. However, the length of these captions was short, and the number of generated captions was limited. In addition, it is unknown whether a short story can be generated based on the image, because many sentences have to be connected to create a fluent short story. As a result, this study introduces an encoder-decoder framework structure to generate a short story captioning (SSCap) using a common image caption dataset and a manually collected story corpus. This manuscript has three main contributions, which include 1) an unsupervised deep learning-based framework that combines a recurrent neural network (RNN) structure and encoder-decoder model for composing a short story for an image, 2) a huge story corpus, which includes two different genres (horror and romantic), manually collected and validated. Extensive experiments demonstrated that short stories created by the proposed model show creative content compared to existing systems that can only make concise sentences. Therefore, the demonstrated framework has the potential to motivate the development of a more robust AI story writer and motivates the integration of the suggested model into practical applications to help the story writers find a new idea.
引用
收藏
页码:113550 / 113557
页数:8
相关论文
共 24 条
  • [1] Agrawal Harsh, 2016, ARXIV160607493
  • [2] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
    Anderson, Peter
    He, Xiaodong
    Buehler, Chris
    Teney, Damien
    Johnson, Mark
    Gould, Stephen
    Zhang, Lei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
  • [3] [Anonymous], 2015, P C N AM CHAPTER ASS
  • [4] Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
    Bernardi, Raffaella
    Cakici, Ruket
    Elliott, Desmond
    Erdem, Aykut
    Erdem, Erkut
    Ikizler-Cinbis, Nazli
    Keller, Frank
    Muscat, Adrian
    Plank, Barbara
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 55 : 409 - 442
  • [5] Chen X, 2015, PROC CVPR IEEE, P2422, DOI 10.1109/CVPR.2015.7298856
  • [6] Model-Free Renewable Scenario Generation Using Generative Adversarial Networks
    Chen, Yize
    Wang, Yishen
    Kirschen, Daniel
    Zhang, Baosen
    [J]. IEEE TRANSACTIONS ON POWER SYSTEMS, 2018, 33 (03) : 3265 - 3275
  • [7] Face image manipulation detection based on a convolutional neural network
    Dang, L. Minh
    Hassan, Syed Ibrahim
    Im, Suhyeon
    Moon, Hyeonjoon
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 129 : 156 - 168
  • [8] Deep Learning Based Computer Generated Face Identification Using Convolutional Neural Network
    Dang, L. Minh
    Hassan, Syed Ibrahim
    Im, Suhyeon
    Lee, Jaecheol
    Lee, Sujin
    Moon, Hyeonjoon
    [J]. APPLIED SCIENCES-BASEL, 2018, 8 (12):
  • [9] Dollar P., 2015, CoRR
  • [10] Frome A, 2013, Advances in Neural Information Processing Systems, V26, DOI DOI 10.5555/2999792.2999849