Seformer: a long sequence time-series forecasting model based on binary position encoding and information transfer regularization

被引:0
作者
Pengyu Zeng
Guoliang Hu
Xiaofeng Zhou
Shuai Li
Pengjie Liu
机构
[1] Chinese Academy of Sciences,Key Laboratory of Networked Control Systems
[2] Chinese Academy of Sciences,Shenyang Institute of Automation
[3] Chinese Academy of Sciences,Institutes for Robotics and Intelligent Manufacturing
[4] University of Chinese Academy of Sciences,undefined
来源
Applied Intelligence | 2023年 / 53卷
关键词
Long sequence time-series forecasting; Transformer; Position encoding; Regularization method; Conditional variational autoencoder;
D O I
暂无
中图分类号
学科分类号
摘要
Long sequence time-series forecasting (LSTF) problems, such as weather forecasting, stock market forecasting, and power resource management, are widespread in the real world. The LSTF problem requires a model with high prediction accuracy. Recent studies have shown that the transformer model architecture is the most promising model structure for LSTF problems compared with other model architectures. The transformer model has the property of permutation equivalence, which leads to the importance of sequence position encoding, an essential process in model training. Currently, the continuous dynamics models constructed for position encoding using the neural differential equations (neural ODEs) method can model sequence position information well. However, we have found that there are some limitations when neural ODEs are applied to the LSTF problem, including the time cost problem, the baseline drift problem, and the information loss problem; thus, neural ODEs cannot be directly applied to the LSTF problem. To address this problem, we design a binary position encoding-based regularization model for long sequence time-series prediction, named Seformer, which has the following structure: 1) The binary position encoding mechanism, including intrablock and interblock position encoding. For intrablock position encoding, we design a simple ODE method by discretizing the continuum dynamics model, which reduces the time cost required to compute neural ODEs while maintaining their dynamics properties to the maximum extent. In interblock position encoding, a chunked recursive form is adopted to alleviate the baseline drift problem caused by eigenvalue explosion. 2) Information transfer regularization mechanism: By regularizing the model intermediate hidden variables as well as the encoder-decoder connection variables, we can reduce information loss during the model training process while ensuring the smoothness of the position information. Extensive experimental results obtained on six large-scale datasets show a consistent improvement in our approach over the baselines.
引用
收藏
页码:15747 / 15771
页数:24
相关论文
共 76 条
[1]  
Zhang Z(2021)Application of variational mode decomposition and chaotic grey wolf optimizer with support vector regression for forecasting electric loads Knowl-Based Syst 228 107297-139
[2]  
Hong W-C(2019)Financial time series forecasting model based on ceemdan and lstm Stat Mech Appl 519 127-71338
[3]  
Cao J(2020)Fractional neuro-sequential arfima-lstm for financial market forecasting IEEE Access 8 71326-10
[4]  
Li Z(2020)Forecasting epidemic spread of sars-cov-2 using arima model (case study: Iran) Global J Environ Sci Manag 6 1-2713
[5]  
Li J(2021)Forecasting covid-19 outbreak progression using hybrid polynomial-bayesian ridge regression model Appl Intell 51 2703-1780
[6]  
Bukhari AH(1997)Long short-term memory Neural Comput 9 1735-44
[7]  
Raja MAZ(2021)Deep transformer modeling via grouping skip connection for neural machine translation Knowl-Based Syst 234 107556-831
[8]  
Sulaiman M(2021)Visual affordance detection using an efficient attention convolutional neural network Neurocomputing 440 36-292
[9]  
Islam S(2020)Time series modelling to forecast the confirmed and recovered cases of covid-19 Travel Med Infect Dis 37 101742-11115
[10]  
Shoaib M(2018)Predictability of monthly temperature and precipitation using automatic time series forecasting methods Acta Geophys 66 807-1604