A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction

被引：6

作者：

Chatterjee, Moitreya ^{[1
]}

Ahuja, Narendra ^{[1
]}

Cherian, Anoop ^{[2
]}

机构：

[1] Univ Illinois, Champaign, IL 61820 USA

[2] Mitsubishi Elect Res Labs, Cambridge, MA 02139 USA

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

美国食品与农业研究所;

关键词：

D O I：

10.1109/ICCV48922.2021.00961

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena. Prior approaches to solve this task typically estimate a latent prior characterizing this stochasticity, however do not account for the predictive uncertainty of the (deep learning) model. Such approaches often derive the training signal from the mean-squared error (MSE) between the generated frame and the ground truth, which can lead to sub-optimal training, especially when the predictive uncertainty is high. Towards this end, we introduce Neural Uncertainty Quantifier (NUQ) - a stochastic quantification of the model's predictive uncertainty, and use it to weigh the MSE loss. We propose a hierarchical, variational framework to derive NUQ in a principled manner using a deep, Bayesian graphical model. Our experiments on three benchmark stochastic video prediction datasets show that our proposed framework trains more effectively compared to the state-of-the-art models (especially when the training sets are small), while demonstrating better video generation quality and diversity against several evaluation metrics.

引用

页码：9731 / 9741

页数：11

共 63 条

[1] [Anonymous], 2014, ARXIV14117610
[2] [Anonymous], 2016, ADV NEURAL INFORM PR, DOI DOI 10.48550/ARXIV.1605.07157
[3] Ardizzone L., 2020, P ADV NEUR INF PROC, V33, P7828
[4] Babaeizadeh Mohammad, 2018, INT C LEARNING REPRE
[5] CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training
Bao, Jianmin
Chen, Dong
Wen, Fang
Li, Houqiang
Hua, Gang
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2764 - 2773
[6] HP-GAN: Probabilistic 3D human motion prediction via GAN
Barsoum, Emad
Kender, John
Liu, Zicheng
[J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 1499 - 1508
[7] Blundell C, 2015, PR MACH LEARN RES, V37, P1613
[8] ContextVP: Fully Context-Aware Video Prediction
Byeon, Wonmin
Wang, Qin
Srivastava, Rupesh Kumar
Koumoutsakos, Petros
[J]. COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 : 781 - 797
[9] Castrejon Lluis, 2019, P IEEE INT C COMP VI
[10] Rate bounds on SSIM index of quantized images
Channappayya, Sumohana S.
Bovik, Alan Conrad
Heath, Robert W.
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2008, 17 (09) : 1624 - 1639

← 1 2 3 4 5 6 7 →