Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning

被引:93
作者
Lee, Sangmin [1 ]
Kim, Hak Gu [2 ]
Choi, Dae Hwi [1 ]
Kim, Hyung-Il [1 ,3 ]
Ro, Yong Man [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Image & Video Syst Lab, Daejeon, South Korea
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[3] ETRI, Daejeon, South Korea
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
关键词
D O I
10.1109/CVPR46437.2021.00307
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our work addresses long-term motion context issues for predicting future frames. To predict the future precisely, it is required to capture which long-term motion context (e.g., walking or running) the input motion (e.g., leg movement) belongs to. The bottlenecks arising when dealing with the long-term motion context are: (i) how to predict the long-term motion context naturally matching input sequences with limited dynamics, (ii) how to predict the long-term motion context with high-dimensionality (e.g., complex motion). To address the issues, we propose novel motion context-aware video prediction. To solve the bottleneck (i), we introduce a long-term motion context memory (LMC-Memory) with memory alignment learning. The proposed memory alignment learning enables to store long-term motion contexts into the memory and to match them with sequences including limited dynamics. As a result, the long-term context can be recalled from the limited input sequence. In addition, to resolve the bottleneck (ii), we propose memory query decomposition to store local motion context (i.e., low-dimensional dynamics) and recall the suitable local context for each local part of the input individually. It enables to boost the alignment effects of the memory. Experimental results show that the proposed method outperforms other sophisticated RNN-based methods, especially in long-term condition. Further, we validate the effectiveness of the proposed network designs by conducting ablation studies and memory feature analysis. The source code of this work is available(dagger).
引用
收藏
页码:3053 / 3062
页数:10
相关论文
共 47 条
[1]  
[Anonymous], 2017, J PLANT BIOCH PHYSL
[2]  
[Anonymous], 2018, INT C MACH LEARN ICM
[3]  
[Anonymous], 2018, Stochastic adversarial video prediction
[4]  
[Anonymous], 2019, Advances in Neural Information Processing Systems
[5]  
[Anonymous], 2017, INT C MACH LEARN ICM
[6]  
Babaeizadeh M., 2018, 6 INT C LEARN REPR I
[7]   ContextVP: Fully Context-Aware Video Prediction [J].
Byeon, Wonmin ;
Wang, Qin ;
Srivastava, Rupesh Kumar ;
Koumoutsakos, Petros .
COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :781-797
[8]   Memory Matching Networks for One-Shot Image Recognition [J].
Cai, Qi ;
Pan, Yingwei ;
Yao, Ting ;
Yan, Chenggang ;
Mei, Tao .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4080-4088
[9]   Improved Conditional VRNNs for Video Prediction [J].
Castrejon, Lluis ;
Ballas, Nicolas ;
Courville, Aaron .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7607-7616
[10]   TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions [J].
Chandra, Rohan ;
Bhattacharya, Uttaran ;
Bera, Aniket ;
Manocha, Dinesh .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :8475-8484