Persistence Initialization: a novel adaptation of the Transformer architecture for time series forecasting

被引:10
作者
Haugsdal, Espen [1 ]
Aune, Erlend [1 ,2 ]
Ruocco, Massimiliano [1 ,3 ]
机构
[1] Norwegian Univ Sci & Technol, Trondheim, Norway
[2] BI Norwegian Business Sch, Oslo, Norway
[3] Sintef Digital, Trondheim, Norway
关键词
Transformer; Time series forecasting; M4; competition; Deep neural networks;
D O I
10.1007/s10489-023-04927-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Time series forecasting is an important problem, with many real world applications. Transformer models have been successfully applied to natural language processing tasks, but have received relatively little attention for time series forecasting. Motivated by the differences between classification tasks and forecasting, we propose PI-Transformer, an adaptation of the Transformer architecture designed for time series forecasting, consisting of three parts: First, we propose a novel initialization method called Persistence Initialization, with the goal of increasing training stability of forecasting models by ensuring that the initial outputs of an untrained model are identical to the outputs of a simple baseline model. Second, we use ReZero normalization instead of Layer Normalization, in order to further tackle issues related to training stability. Third, we use Rotary positional encodings to provide a better inductive bias for forecasting. Multiple ablation studies show that the PI-Transformer is more accurate, learns faster, and scales better than regular Transformer models. Finally, PI-Transformer achieves competitive performance on the challenging M4 dataset, both when compared to the current state of the art, and to recently proposed Transformer models for time series forecasting.
引用
收藏
页码:26781 / 26796
页数:16
相关论文
共 29 条
  • [11] Hyndman R.J., 2018, Forecasting: principles and practice, V2nd
  • [12] Another look at measures of forecast accuracy
    Hyndman, Rob J.
    Koehler, Anne B.
    [J]. INTERNATIONAL JOURNAL OF FORECASTING, 2006, 22 (04) : 679 - 688
  • [13] A brief history of forecasting competitions
    Hyndman, Rob J.
    [J]. INTERNATIONAL JOURNAL OF FORECASTING, 2020, 36 (01) : 7 - 14
  • [14] Ioffe S., 2015, P 32 INT C INT C MAC, P1, DOI DOI 10.48550/ARXIV.1502.03167
  • [15] Highly accurate protein structure prediction with AlphaFold
    Jumper, John
    Evans, Richard
    Pritzel, Alexander
    Green, Tim
    Figurnov, Michael
    Ronneberger, Olaf
    Tunyasuvunakool, Kathryn
    Bates, Russ
    Zidek, Augustin
    Potapenko, Anna
    Bridgland, Alex
    Meyer, Clemens
    Kohl, Simon A. A.
    Ballard, Andrew J.
    Cowie, Andrew
    Romera-Paredes, Bernardino
    Nikolov, Stanislav
    Jain, Rishub
    Adler, Jonas
    Back, Trevor
    Petersen, Stig
    Reiman, David
    Clancy, Ellen
    Zielinski, Michal
    Steinegger, Martin
    Pacholska, Michalina
    Berghammer, Tamas
    Bodenstein, Sebastian
    Silver, David
    Vinyals, Oriol
    Senior, Andrew W.
    Kavukcuoglu, Koray
    Kohli, Pushmeet
    Hassabis, Demis
    [J]. NATURE, 2021, 596 (7873) : 583 - +
  • [16] Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
    Kim, Jaeyoung
    El-Khamy, Mostafa
    Lee, Jungwon
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1591 - 1595
  • [17] The M3-Competition: results, conclusions and implications
    Makridakis, S
    Hibon, M
    [J]. INTERNATIONAL JOURNAL OF FORECASTING, 2000, 16 (04) : 451 - 476
  • [18] The M4 Competition: 100,000 time series and 61 forecasting methods
    Makridakis, Spyros
    Spiliotis, Evangelos
    Assimakopoulos, Vassilios
    [J]. INTERNATIONAL JOURNAL OF FORECASTING, 2020, 36 (01) : 54 - 74
  • [19] The M4 Competition: Results, findings, conclusion and way forward
    Makridakis, Spyros
    Spiliotis, Evangelos
    Assimakopoulos, Vassilios
    [J]. INTERNATIONAL JOURNAL OF FORECASTING, 2018, 34 (04) : 802 - 808
  • [20] Oreshkin B.N., 2019, ICLR, DOI DOI 10.48550/ARXIV.1905.10437