TiVGAN: Text to Image to Video Generation With Step-by-Step Evolutionary Generator

被引：25

作者：

Kim, Doyeon ^{[1
]}

Joo, Donggyu ^{[1
]}

Kim, Junmo ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Sch Elect Engn, Daejeon 34141, South Korea

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Computer vision; deep learning; generative adversarial networks; video generation; text-to-video generation;

D O I：

10.1109/ACCESS.2020.3017881

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Advances in technology have led to the development of methods that can create desired visual multimedia. In particular, image generation using deep learning has been extensively studied across diverse fields. In comparison, video generation, especially on conditional inputs, remains a challenging and less explored area. To narrow this gap, we aim to train our model to produce a video corresponding to a given text description. We propose a novel training framework, Text-to-Image-to-Video Generative Adversarial Network (TiVGAN), which evolves frame-by-frame and finally produces a full-length video. In the first phase, we focus on creating a high-quality single video frame while learning the relationship between the text and an image. As the steps proceed, our model is trained gradually on more number of consecutive frames. This step-by-step learning process helps stabilize the training and enables the creation of high-resolution video based on conditional text descriptions. Qualitative and quantitative experimental results on various datasets demonstrate the effectiveness of the proposed method.

引用

页码：153113 / 153122

页数：10

共 29 条

[1]

Aifanti N, 2010, 11 INT WORKSH IM AN, P1, DOI DOI 10.1371/JOURNAL.PONE.0009715

[2]

[Anonymous], 2017, BEGAN BOUNDARY EQUIL

[3]

[Anonymous], ARXIV160505396

[4]

Balaji Y, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1995

[5]

Chen X, 2016, ADV NEUR IN, V29

[6]

Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672

[7]

Heusel M, 2017, ADV NEUR IN, V30

[8] Image-to-Image Translation with Conditional Adversarial Networks [J].

Isola, Phillip ;

Zhu, Jun-Yan ;

Zhou, Tinghui ;

Efros, Alexei A. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5967-5976

[9] Generating a Fusion Image: One's Identity and Another's Shape [J].

Joo, Donggyu ;

Kim, Doyeon ;

Kim, Junmo .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1635-1643

[10]

Karras T., 2018, INT C LEARNING REPRE

← 1 2 3 →