Motion-Aware Feature Enhancement Network for Video Prediction

被引：22

作者：

Lin, Xue ^{[1
]}

Zou, Qi ^{[1
]}

Xu, Xixia ^{[1
]}

Huang, Yaping ^{[1
]}

Tian, Yi ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2021年 / 31卷 / 02期

关键词：

Predictive models; Encoding; Multiprotocol label switching; Stochastic processes; Dynamics; Feature extraction; Task analysis; Video prediction; unsupervised learning; attention mechanism; perceptual loss;

D O I：

10.1109/TCSVT.2020.2987141

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Video prediction is challenging, due to the pixel-level precision requirement and the difficulty in capturing scene dynamics. Most approaches tackle the problems by pixel-level reconstruction objectives and two decomposed branches, which still suffer from blurry generations or dramatic degradations in long-term prediction. In this paper, we propose a Motion-Aware Feature Enhancement (MAFE) network for video prediction to produce realistic future frames and achieve relatively long-term predictions. First, a Channel-wise and Spatial Attention (CSA) module is designed to extract motion-aware features, which enhances the contribution of important motion details during encoding, and subsequently improves the discriminability of attention map for the frame refinement. Second, a Motion Perceptual Loss (MPL) is proposed to guide the learning of temporal cues, which benefits to robust long-term video prediction. Extensive experiments on three human activity video datasets: KTH, Human3.6M, and PennAction demonstrate the effectiveness of the proposed video prediction model compared with the state-of-the-art approaches.

引用

页码：688 / 700

页数：13

共 57 条

[1]

Nguyen A, 2015, PROC CVPR IEEE, P427, DOI 10.1109/CVPR.2015.7298640

[2]

[Anonymous], 2015, ICLR

[3]

[Anonymous], 2015, P NEURIPS

[4]

Babaeizadeh M., 2018, 2018 IEEE IND APPL S, P1, DOI DOI 10.1109/IAS.2018.8544714

[5] ContextVP: Fully Context-Aware Video Prediction [J].

Byeon, Wonmin ;

Wang, Qin ;

Srivastava, Rupesh Kumar ;

Koumoutsakos, Petros .

COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :781-797

[6] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].

Chen, Long ;

Zhang, Hanwang ;

Xiao, Jun ;

Nie, Liqiang ;

Shao, Jian ;

Liu, Wei ;

Chua, Tat-Seng .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306

[7] MARS: Motion-Augmented RGB Stream for Action Recognition [J].

Crasto, Nieves ;

Weinzaepfel, Philippe ;

Alahari, Karteek ;

Schmid, Cordelia .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7874-7883

[8]

Denton E., 2018, P INT C MACH LEARN, P1

[9]

Ebert F, 2017, PR MACH LEARN RES, V78

[10]

Feichtenhofer C, 2016, ADV NEUR IN, V29

← 1 2 3 4 5 6 →