Story-to-Images Translation: Leveraging Diffusion Models and Large Language Models for Sequence Image Generation

被引:0
作者
Kumagai, Haruka [1 ]
Yamaki, Ryosuke [2 ,3 ]
Naganuma, Hiroki [3 ,4 ]
机构
[1] Univ Tokyo, Tokyo, Japan
[2] Ritsumeikan Univ, Shiga, Japan
[3] ProPlace Inc, Tokyo, Japan
[4] Univ Montreal, Mila, Montreal, PQ, Canada
来源
PROCEEDINGS OF THE 2ND WORKSHOP ON USER-CENTRIC NARRATIVE SUMMARIZATION OF LONG VIDEOS, NARSUM 2023 | 2023年
关键词
large language model; diffusion model; text-to-image generation;
D O I
10.1145/3607540.3617144
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Diffusion models are catalyzing breakthroughs in creative fields, with a notable impact on text-to-image generation. This study centers on the transformation of textual narratives into coherent sequences of images-a process currently hampered by issues of consistency and contextual fidelity. To address these challenges, we propose a method utilizing a large language model, with an emphasis on context and character information. Empirical evaluations, carried out using Hollywood movie scripts, clearly indicate that our approach improves both the consistency and contextual fidelity of the resulting image sequences.
引用
收藏
页码:57 / 63
页数:7
相关论文
共 28 条
  • [1] Akyürek E, 2023, Arxiv, DOI arXiv:2211.15661
  • [2] Constant-roll in the Palatini-R2 models
    Antoniadis, Ignation
    Lykkas, Angelos
    Tamvakis, Kyriakos
    [J]. JOURNAL OF COSMOLOGY AND ASTROPARTICLE PHYSICS, 2020, (04):
  • [3] Chen WH, 2022, Arxiv, DOI arXiv:2209.14491
  • [4] Conwell C, 2022, Arxiv, DOI arXiv:2208.00005
  • [5] Gal R, 2022, Arxiv, DOI [arXiv:2208.01618, 10.48550/arXiv.2208.01618]
  • [6] Generative Adversarial Networks
    Goodfellow, Ian
    Pouget-Abadie, Jean
    Mirza, Mehdi
    Xu, Bing
    Warde-Farley, David
    Ozair, Sherjil
    Courville, Aaron
    Bengio, Yoshua
    [J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144
  • [7] Imagine This! Scripts to Compositions to Videos
    Gupta, Tanmay
    Schwenk, Dustin
    Farhadi, Ali
    Hoiem, Derek
    Kembhavi, Aniruddha
    [J]. COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 610 - 626
  • [8] Lester B, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P3045
  • [9] Improved-StoryGAN for sequential images visualization
    Li, Chunye
    Kong, Liya
    Zhou, Zhiping
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 73
  • [10] StoryGAN: A Sequential Conditional GAN for Story Visualization
    Li, Yitong
    Gan, Zhe
    Shen, Yelong
    Liu, Jingjing
    Cheng, Yu
    Wu, Yuexin
    Carin, Lawrence
    Carlson, David
    Gao, Jianfeng
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6322 - 6331