Cross-modality motion parameterization for fine-grained video prediction

被引：2

作者：

Yan, Yichao ^{[1
]}

Ni, Bingbing ^{[1
]}

Zhang, Wendong ^{[1
]}

Tang, Jun ^{[1
]}

Yang, Xiaokang ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, 800 Dongchuan Rd, Shanghai 200240, Peoples R China

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2019年 / 183卷

基金：

美国国家科学基金会;

关键词：

Video generation; Cross-modality constraint; Adversarial learning;

D O I：

10.1016/j.cviu.2019.03.006

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While predicting video content is challenging given the huge unconstrained searching space, this work explores cross-modality constraints to safeguard the video generation process and seeks improved content prediction. By observing the underlying correspondence between the sound and the object movement, we propose a novel cross-modality video generation network. Via adversarial training, this network directly links sound with the movement parameters of the operated object and automatically outputs corresponding object motion according to the rhythm of the given audio signal. We experiment on both rigid object and non-rigid object motion prediction tasks and show that our method significantly reduces motion uncertainty for the generated video content, with the guidance of the associated audio information.

引用

页码：11 / 19

页数：9