DANet: A spatio-temporal dynamics and Detail Aware Network for video prediction

被引：0

作者：

Huang, Huilin ^{[1
]}

Guan, YePeng ^{[1
,2
,3
]}

机构：

[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai 200444, Peoples R China

[2] Minist Educ, Key Lab Adv Display & Syst Applicat, Shanghai 200072, Peoples R China

[3] Shanghai Univ, Key Lab Silicate Cultural Rel Conservat, Minist Educ, Shanghai 200444, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 598卷

关键词：

Video prediction; Spatialtemporal dynamics; Details information; Motion patterns;

D O I：

10.1016/j.neucom.2024.128023

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video prediction aims to predict the upcoming future frames by modeling the complex spatiotemporal dynamics from given videos. However, most existing video prediction methods still perform sub-optimal in generating high-visual-quality future frames. The reasons behind that are: 1) these methods struggle to reason accurate future motion due to extracting insufficient spatiotemporal correlations from the given frames. 2) The state transition units in the previous works are complex, which inevitably results in the loss of spatial details. When the videos contain variable motion patterns ( e.g. rapid movement of objects) and complex spatial information ( e.g. texture details), blurring artifacts and local absence of objects may occur in the predicted frames. In this work, to predict more accurate future motion and preserve more details information, we propose an end -toend trainable dual-branch video prediction framework, spatiotemporal Dynamics and Detail Aware Network (DANet). Specifically, to predict future motion, we propose a SpatioTemporal Memory (ST-Memory) to learn motion evolution in the temporal domain from the given frames by transmitting the deep features along a zigzag direction. To obtain adequate spatiotemporal correlations among frames, the MotionCell is constructed in the ST-Memory to facilitate the expansion of the receptive field. The spatiotemporal attention is utilized in the ST-Memory to focus on the global variation of given frames. Additionally, to preserve useful spatial details, we design the Spatial Details Memory (SD-Memory) to capture the global and local dependencies of the given frames at the pixel level. Extensive experiments conducted on three public datasets for both synthetic and natural demonstrate that the DANet has excellent performance for video prediction compared with state -ofthe -art methods. In brief, DANet outperforms the state -of -the -art methods in terms of MSE by 3.1, 1.0 x10 -2 and 14.3 x 10 on three public benchmark datasets, respectively.

引用

页数：11

共 8 条

[1] Unsupervised Video Prediction Network with Spatio-temporal Deep Features
Jin, Beibei
Zhou, Rong
Zhang, Zhisheng
Dai, Min
PROCEEDINGS OF THE 2018 25TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND MACHINE VISION IN PRACTICE (M2VIP), 2018, : 19 - 24
[2] Motion-Aware Feature Enhancement Network for Video Prediction
Lin, Xue
Zou, Qi
Xu, Xixia
Huang, Yaping
Tian, Yi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (02) : 688 - 700
[3] Cardiac disease prediction from spatio-temporal motion patterns in cine-MRI
Sarmiento, Everson
Pico, Jean
Martinez, Fabio
2018 IEEE 15TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2018), 2018, : 1305 - 1308
[4] Predicting Citywide Passenger Demand via Reinforcement Learning from Spatio-Temporal Dynamics
Ning, Xiaodong
Yao, Lina
Wang, Xianzhi
Benatallah, Boualem
Salim, Flora
Haghighi, Pari Delir
PROCEEDINGS OF THE 15TH EAI INTERNATIONAL CONFERENCE ON MOBILE AND UBIQUITOUS SYSTEMS: COMPUTING, NETWORKING AND SERVICES (MOBIQUITOUS 2018), 2018, : 19 - 28
[5] Long-Term Prediction of μECOG Signals with a Spatio-Temporal Pyramid of Adversarial Convolutional Networks
Wang, Ran
Song, Yilin
Wang, Yao
Viventi, Jonathan
2018 IEEE 15TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2018), 2018, : 1313 - 1317
[6] A Dual-Branch Spatial-Temporal Learning Network for Video Prediction
Huang, Huilin
Guan, Yepeng
IEEE ACCESS, 2024, 12 : 73258 - 73267
[7] UNIMEMnet: Learning long-term motion and appearance dynamics for video prediction with a unified memory network
Dai, Kuai
Li, Xutao
Luo, Chuyao
Chen, Wuqiao
Ye, Yunming
Feng, Shanshan
NEURAL NETWORKS, 2023, 168 : 256 - 271
[8] Spatio-Temporal Dynamics and Driving Forces of Multi-Scale CO2 Emissions by Integrating DMSP-OLS and NPP-VIIRS Data: A Case Study in Beijing-Tianjin-Hebei, China
Xia, Shiyu
Shao, Huaiyong
Wang, Hao
Xian, Wei
Shao, Qiufang
Yin, Ziqiang
Qi, Jiaguo
REMOTE SENSING, 2022, 14 (19)

← 1 →