PWDformer: Deformable transformer for long-term series forecasting

被引:11
作者
Wang, Zheng [1 ,2 ,3 ]
Ran, Haowei [1 ,2 ,3 ]
Ren, Jinchang [4 ,5 ]
Sun, Meijun [1 ,2 ,3 ]
机构
[1] Tianjin Univ, 135 Yaguan Rd, Tianjin 300192, Peoples R China
[2] Tianjin Univ, Tianjin Key Lab Machine Learning, Tianjin 300192, Peoples R China
[3] Minist Educ Peoples Republ China, Engn Res Ctr City intelligence & Digital Governanc, Tianjin 300192, Peoples R China
[4] Robert Gordon Univ, Natl Subsea Ctr, 3 Int Ave, Aberdeen AB21 0BH, Scotland
[5] Robert Gordon Univ, Sch Comp, 3 Int Ave, Aberdeen AB21 0BH, Scotland
基金
中国国家自然科学基金;
关键词
Long-term forecasting; Time series forecasting; Deep learning; Transformer;
D O I
10.1016/j.patcog.2023.110118
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Long-term forecasting is of paramount importance in numerous scenarios, including predicting future energy, water, and food consumption. For instance, extreme weather events and natural disasters can profoundly impact infrastructure operations and pose severe safety concerns. Traditional CNN -based models often struggle to capture long-distance dependencies effectively. In contrast, Transformers -based models have shown significant promise in long-term forecasting. This paper investigates the long-term forecasting problem and identifies a common limitation in existing Transformer -based models: they tend to reduce computational complexity at the expense of time information aggregation capability. Moreover, the order of time series plays a crucial role in accurate predictions, but current Transformer -based models lack sensitivity to time series order, rendering them unreasonable. To address these issues, we propose a novel Deformable-Local (DL) aggregation mechanism. This mechanism enhances the model's ability to aggregate time information and allows the model to adaptively adjust the size of the time aggregation window. Consequently, the model can discern more complex time patterns, leading to more accurate predictions. Additionally, our model incorporates a Frequency Selection module to reinforce effective features and reduce noise. Furthermore, we introduce Position Weights to mitigate the order -insensitivity problem present in existing methods. In extensive evaluations of long-term forecasting tasks, we conducted benchmark tests on six datasets covering various practical applications, including energy, traffic, economics, weather, and disease. Our method achieved state-of-the-art (SOTA) results, demonstrating significant improvements. For instance, on the ETT dataset, our model achieved an average MSE improvement of approximately 19% and an average MAE improvement of around 27%. Remarkably, for predicted lengths of 96 and 192, we achieved outstanding MSE and MAE improvements of 32.1% and 30.9%, respectively.
引用
收藏
页数:15
相关论文
共 45 条
[1]  
Aicher Christopher., 2020, P P 35 UNC ART INT C, V2020, P799
[2]  
Bai SJ, 2018, Arxiv, DOI [arXiv:1803.01271, 10.48550/arXiv.1803.01271, DOI 10.48550/ARXIV.1803.01271]
[3]  
Beltagy I, 2020, Arxiv, DOI [arXiv:2004.05150, 10.48550/arXiv.2004.05150]
[4]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[5]  
Brockwell PJ, 2016, SPRINGER TEXTS STAT, P1, DOI 10.1007/978-3-319-29854-2
[6]  
Brownlee J., 2017, How to decompose time series data into trend and seasonality
[7]  
Cao DF, 2020, ADV NEUR IN, V33
[8]   Financial time series forecasting with multi-modality graph neural network [J].
Cheng, Dawei ;
Yang, Fangzhou ;
Xiang, Sheng ;
Liu, Jin .
PATTERN RECOGNITION, 2022, 121
[9]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[10]  
DURBIN J., 2012, TIME SERIES ANAL STA, V38