CNN-Based Time Series Decomposition Model for Video Prediction

被引：0

作者：

Lee, Jinyoung ^{[1
]}

Kim, Gyeyoung ^{[1
]}

机构：

[1] Soongsil Univ, Sch Software, Seoul 06978, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Predictive models; Convolutional neural networks; Spatiotemporal phenomena; Time series analysis; Forecasting; Transformers; Data models; Deep learning; deep learning architecture; spatiotemporal representation learning; time series forecasting; video prediction; NETWORK;

D O I：

10.1109/ACCESS.2024.3458460

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video prediction presents a formidable challenge, requiring effectively processing spatial and temporal information embedded in videos. While recurrent neural network (RNN) and transformer-based models have been extensively explored to address spatial changes over time, recent advancements in convolutional neural networks (CNNs) have yielded high-performance video prediction models. CNN-based models offer advantages over RNN and transformer-based models due to their ease of parallel processing and lower computational complexity, highlighting their significance in practical applications. However, existing CNN-based video prediction models typically treat the spatiotemporal channels of videos similarly to the channel axis of static images. They stack frames in temporal order to construct a spatiotemporal axis and employ standard 1 x 1 convolution operations. Nevertheless, this approach has its limitations. Applying 1 x 1 convolution directly to the spatiotemporal axis results in a mixture of temporal and spatial information, which may lead to computational inefficiencies and reduced accuracy. Additionally, this operation needs to improve in processing temporal data. This study introduces a CNN-based time series decomposition model for video prediction. The proposed model first divides the 1 x 1 convolution operation within the channel aggregation module to independently process the temporal and spatial dimensions. To capture evolving features, the temporal axis is segregated into trend and residual components, followed by applying a time series decomposition forecasting method. To assess the performance of the proposed technique, experiments were conducted using the moving MNIST, KTH, and KITTI-Caltech benchmark datasets. In the experiments on moving MNIST, despite a reduction of approximately 55% in the number of parameters and 37% in computational cost, the proposed method improved accuracy by up to 7% compared to the previous approach.

引用

页码：131205 / 131216

页数：12

共 51 条

[1] Unsupervised Video Representation Learning by Bidirectional Feature Prediction [J].

Behrmann, Nadine ;

Gall, Juergen ;

Noroozi, Mehdi .

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, :1669-1678

[2]

Chang Z., 2021, P IEEE INT C MULT EX, P1

[3]

Chang Z, 2021, ADV NEUR IN, V34

[4] Anomaly detection in surveillance video based on bidirectional prediction [J].

Chen, Dongyue ;

Wang, Pengtao ;

Yue, Lingyi ;

Zhang, Yuxin ;

Jia, Tong .

IMAGE AND VISION COMPUTING, 2020, 98 (98)

[5]

Dollár P, 2009, PROC CVPR IEEE, P304, DOI 10.1109/CVPRW.2009.5206631

[6]

Escontrela A, 2023, ADV NEUR IN

[7]

Fan J., 2023, IEEE Access, DOI [10.1109/ACCESS.2023.3287893, DOI 10.1109/ACCESS.2023.3287893]

[8] Future Frame Prediction for Robot-Assisted Surgery [J].

Gao, Xiaojie ;

Jin, Yueming ;

Zhao, Zixu ;

Dou, Qi ;

Heng, Pheng-Ann .

INFORMATION PROCESSING IN MEDICAL IMAGING, IPMI 2021, 2021, 12729 :533-544

[9] SimVP: Simpler yet Better Video Prediction [J].

Gao, Zhangyang ;

Tan, Cheng ;

Wu, Lirong ;

Li, Stan Z. .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :3160-3170

[10] Vision meets robotics: The KITTI dataset [J].

Geiger, A. ;

Lenz, P. ;

Stiller, C. ;

Urtasun, R. .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1231-1237

← 1 2 3 4 5 6 →