Motion-Aware Feature Enhancement Network for Video Prediction

被引:22
作者
Lin, Xue [1 ]
Zou, Qi [1 ]
Xu, Xixia [1 ]
Huang, Yaping [1 ]
Tian, Yi [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China
关键词
Predictive models; Encoding; Multiprotocol label switching; Stochastic processes; Dynamics; Feature extraction; Task analysis; Video prediction; unsupervised learning; attention mechanism; perceptual loss;
D O I
10.1109/TCSVT.2020.2987141
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video prediction is challenging, due to the pixel-level precision requirement and the difficulty in capturing scene dynamics. Most approaches tackle the problems by pixel-level reconstruction objectives and two decomposed branches, which still suffer from blurry generations or dramatic degradations in long-term prediction. In this paper, we propose a Motion-Aware Feature Enhancement (MAFE) network for video prediction to produce realistic future frames and achieve relatively long-term predictions. First, a Channel-wise and Spatial Attention (CSA) module is designed to extract motion-aware features, which enhances the contribution of important motion details during encoding, and subsequently improves the discriminability of attention map for the frame refinement. Second, a Motion Perceptual Loss (MPL) is proposed to guide the learning of temporal cues, which benefits to robust long-term video prediction. Extensive experiments on three human activity video datasets: KTH, Human3.6M, and PennAction demonstrate the effectiveness of the proposed video prediction model compared with the state-of-the-art approaches.
引用
收藏
页码:688 / 700
页数:13
相关论文
共 57 条
[1]  
Nguyen A, 2015, PROC CVPR IEEE, P427, DOI 10.1109/CVPR.2015.7298640
[2]  
[Anonymous], 2015, ICLR
[3]  
[Anonymous], 2015, P NEURIPS
[4]  
Babaeizadeh M., 2018, 2018 IEEE IND APPL S, P1, DOI DOI 10.1109/IAS.2018.8544714
[5]   ContextVP: Fully Context-Aware Video Prediction [J].
Byeon, Wonmin ;
Wang, Qin ;
Srivastava, Rupesh Kumar ;
Koumoutsakos, Petros .
COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :781-797
[6]   SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
Nie, Liqiang ;
Shao, Jian ;
Liu, Wei ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306
[7]   MARS: Motion-Augmented RGB Stream for Action Recognition [J].
Crasto, Nieves ;
Weinzaepfel, Philippe ;
Alahari, Karteek ;
Schmid, Cordelia .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7874-7883
[8]  
Denton E., 2018, P INT C MACH LEARN, P1
[9]  
Ebert F, 2017, PR MACH LEARN RES, V78
[10]  
Feichtenhofer C, 2016, ADV NEUR IN, V29