Story-to-Images Translation: Leveraging Diffusion Models and Large Language Models for Sequence Image Generation

被引:0
作者
Kumagai, Haruka [1 ]
Yamaki, Ryosuke [2 ,3 ]
Naganuma, Hiroki [3 ,4 ]
机构
[1] Univ Tokyo, Tokyo, Japan
[2] Ritsumeikan Univ, Shiga, Japan
[3] ProPlace Inc, Tokyo, Japan
[4] Univ Montreal, Mila, Montreal, PQ, Canada
来源
PROCEEDINGS OF THE 2ND WORKSHOP ON USER-CENTRIC NARRATIVE SUMMARIZATION OF LONG VIDEOS, NARSUM 2023 | 2023年
关键词
large language model; diffusion model; text-to-image generation;
D O I
10.1145/3607540.3617144
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Diffusion models are catalyzing breakthroughs in creative fields, with a notable impact on text-to-image generation. This study centers on the transformation of textual narratives into coherent sequences of images-a process currently hampered by issues of consistency and contextual fidelity. To address these challenges, we propose a method utilizing a large language model, with an emphasis on context and character information. Empirical evaluations, carried out using Hollywood movie scripts, clearly indicate that our approach improves both the consistency and contextual fidelity of the resulting image sequences.
引用
收藏
页码:57 / 63
页数:7
相关论文
共 28 条
[1]  
Akyürek E, 2023, Arxiv, DOI [arXiv:2211.15661, 10.48550/arXiv.2211.15661]
[2]   Constant-roll in the Palatini-R2 models [J].
Antoniadis, Ignation ;
Lykkas, Angelos ;
Tamvakis, Kyriakos .
JOURNAL OF COSMOLOGY AND ASTROPARTICLE PHYSICS, 2020, (04)
[3]  
Chen WH, 2022, Arxiv, DOI arXiv:2209.14491
[4]  
Conwell C, 2022, Arxiv, DOI arXiv:2208.00005
[5]  
Gal R, 2022, Arxiv, DOI [arXiv:2208.01618, 10.48550/arXiv.2208.01618]
[6]   Generative Adversarial Networks [J].
Goodfellow, Ian ;
Pouget-Abadie, Jean ;
Mirza, Mehdi ;
Xu, Bing ;
Warde-Farley, David ;
Ozair, Sherjil ;
Courville, Aaron ;
Bengio, Yoshua .
COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144
[7]   Imagine This! Scripts to Compositions to Videos [J].
Gupta, Tanmay ;
Schwenk, Dustin ;
Farhadi, Ali ;
Hoiem, Derek ;
Kembhavi, Aniruddha .
COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 :610-626
[8]  
Lester B, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P3045
[9]   Improved-StoryGAN for sequential images visualization [J].
Li, Chunye ;
Kong, Liya ;
Zhou, Zhiping .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 73
[10]   StoryGAN: A Sequential Conditional GAN for Story Visualization [J].
Li, Yitong ;
Gan, Zhe ;
Shen, Yelong ;
Liu, Jingjing ;
Cheng, Yu ;
Wu, Yuexin ;
Carin, Lawrence ;
Carlson, David ;
Gao, Jianfeng .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :6322-6331