Seformer: a long sequence time-series forecasting model based on binary position encoding and information transfer regularization

被引:8
作者
Zeng, Pengyu [1 ,2 ,3 ,4 ]
Hu, Guoliang [1 ,2 ,3 ]
Zhou, Xiaofeng [1 ,2 ,3 ]
Li, Shuai [1 ,2 ,3 ]
Liu, Pengjie [1 ,2 ,3 ,4 ]
机构
[1] Chinese Acad Sci, Key Lab Networked Control Syst, Shenyang 110000, Peoples R China
[2] Chinese Acad Sci, Shenyang Inst Automat, Shenyang 110000, Peoples R China
[3] Chinese Acad Sci, Institutes Robot & Intelligent Mfg, Shenyang 110000, Peoples R China
[4] Univ Chinese Acad Sci, Beijing 100000, Beijing, Peoples R China
关键词
Long sequence time-series forecasting; Transformer; Position encoding; Regularization method; Conditional variational autoencoder;
D O I
10.1007/s10489-022-04263-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Long sequence time-series forecasting (LSTF) problems, such as weather forecasting, stock market forecasting, and power resource management, are widespread in the real world. The LSTF problem requires a model with high prediction accuracy. Recent studies have shown that the transformer model architecture is the most promising model structure for LSTF problems compared with other model architectures. The transformer model has the property of permutation equivalence, which leads to the importance of sequence position encoding, an essential process in model training. Currently, the continuous dynamics models constructed for position encoding using the neural differential equations (neural ODEs) method can model sequence position information well. However, we have found that there are some limitations when neural ODEs are applied to the LSTF problem, including the time cost problem, the baseline drift problem, and the information loss problem; thus, neural ODEs cannot be directly applied to the LSTF problem. To address this problem, we design a binary position encoding-based regularization model for long sequence time-series prediction, named Seformer, which has the following structure: 1) The binary position encoding mechanism, including intrablock and interblock position encoding. For intrablock position encoding, we design a simple ODE method by discretizing the continuum dynamics model, which reduces the time cost required to compute neural ODEs while maintaining their dynamics properties to the maximum extent. In interblock position encoding, a chunked recursive form is adopted to alleviate the baseline drift problem caused by eigenvalue explosion. 2) Information transfer regularization mechanism: By regularizing the model intermediate hidden variables as well as the encoder-decoder connection variables, we can reduce information loss during the model training process while ensuring the smoothness of the position information. Extensive experimental results obtained on six large-scale datasets show a consistent improvement in our approach over the baselines.
引用
收藏
页码:15747 / 15771
页数:25
相关论文
共 43 条
[1]   Stock Price Prediction Using the ARIMA Model [J].
Adebiyi, Ayodele A. ;
Adewumi, Aderemi O. ;
Ayo, Charles K. .
2014 UKSIM-AMSS 16TH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION (UKSIM), 2014, :106-112
[2]   Wireless sensor network for AI-based flood disaster detection [J].
Al Qundus, Jamal ;
Dabbour, Kosai ;
Gupta, Shivam ;
Meissonier, Regis ;
Paschke, Adrian .
ANNALS OF OPERATIONS RESEARCH, 2022, 319 (01) :697-719
[3]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[4]   Fractional Neuro-Sequential ARFIMA-LSTM for Financial Market Forecasting [J].
Bukhari, Ayaz Hussain ;
Raja, Muhammad Asif Zahoor ;
Sulaiman, Muhammad ;
Islam, Saeed ;
Shoaib, Muhammad ;
Kumam, Poom .
IEEE ACCESS, 2020, 8 :71326-71338
[5]   Financial time series forecasting model based on CEEMDAN and LSTM [J].
Cao, Jian ;
Li, Zhi ;
Li, Jian .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2019, 519 :127-139
[6]   Generation and interpretation of parsimonious predictive models for load forecasting in smart heating networks [J].
Castellini, Alberto ;
Bianchi, Federico ;
Farinelli, Alessandro .
APPLIED INTELLIGENCE, 2022, 52 (09) :9621-9637
[7]  
Chen RT, 2018, AD NEURAL INF PROCES, V31
[8]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]   Optimal design of power GaN HEMT field plate structure [J].
Du Shuai ;
Guo Weiling ;
Lei Liang ;
Lin Tianyu .
2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC), 2019,
[10]   Visual affordance detection using an efficient attention convolutional neural network [J].
Gu, Qipeng ;
Su, Jianhua ;
Yuan, Lei .
NEUROCOMPUTING, 2021, 440 :36-44