Grid Diffusion Models for Text-to-Video Generation

被引:1
作者
Lee, Taegyeong [1 ]
Kwon, Soyeong [1 ]
Kim, Taehwan [1 ]
机构
[1] UNIST, Artificial Intelligence Grad Sch, Ulsan, South Korea
来源
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024 | 2024年
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/CVPR52733.2024.00834
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in the diffusion models have significantly improved text-to-image generation. However, generating videos from text is a more challenging task than generating images from text, due to the much larger dataset and higher computational cost required. Most existing video generation methods use either a 3D U-Net architecture that considers the temporal dimension or autoregressive generation. These methods require large datasets and are limited in terms of computational costs compared to text-to-image generation. To tackle these challenges, we propose a simple but effective novel grid diffusion for text-to-video generation without temporal dimension in architecture and a large text-video paired dataset. We can generate a high-quality video using a fixed amount of GPU memory regardless of the number of frames by representing the video as a grid image. Additionally, since our method reduces the dimensions of the video to the dimensions of the image, various image-based methods can be applied to videos, such as text-guided video manipulation from image manipulation. Our proposed method outperforms the existing methods in both quantitative and qualitative evaluations, demonstrating the suitability of our model for real-world video generation.
引用
收藏
页码:8734 / 8743
页数:10
相关论文
共 47 条
[11]  
Fu Tsu-Jui, 2022, ARXIV
[12]  
Ge SW, 2023, IEEE I CONF COMP VIS, P22873, DOI 10.1109/ICCV51070.2023.02096
[13]  
He Y., 2022, ARXIV
[14]  
Ho J., 2022, ARXIV
[15]  
Ho Jonathan, 2022, ARXIV
[16]  
Hong Wenyi, 2022, ARXIV
[17]  
Kawar B., 2022, arXiv
[18]   VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation [J].
Luo, Zhengxiong ;
Chen, Dayou ;
Zhang, Yingya ;
Huang, Yan ;
Wang, Liang ;
Shen, Yujun ;
Zhao, Deli ;
Zhou, Jingren ;
Tan, Tieniu .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :10209-10218
[19]  
Nichol A., 2021, ARXIV
[20]  
OpenAI, 2023, GPT-4 Technical Report