TCCT: Tightly-coupled convolutional transformer on time series forecasting

被引：68

作者：

Shen, Li ^{[1
]}

Wang, Yangzhu ^{[1
]}

机构：

[1] Beihang Univ, Dayuncun Residential Quarter, RM 807,8th Dormitory,29 Zhichun Rd, Beijing 100191, Peoples R China

来源：

NEUROCOMPUTING | 2022年 / 480卷

基金：

中国国家自然科学基金;

关键词：

Time series forecasting; Transformer; CNN;

D O I：

10.1016/j.neucom.2022.01.039

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Time series forecasting is essential for a wide range of real-world applications. Recent studies have shown the superiority of Transformer in dealing with such problems, especially long sequence time series input (LSTI) and long sequence time series forecasting (LSTF) problems. To improve the efficiency and enhance the locality of Transformer, these studies combine Transformer with CNN in varying degrees. However, their combinations are loosely-coupled and do not make full use of CNN. To address this issue, we propose the concept of tightly-coupled convolutional Transformer (TCCT) and three TCCT architectures which apply transformed CNN architectures into Transformer: (1) CSPAttention: through fusing CSPNet with self-attention mechanism, the computation cost of self-attention mechanism is reduced by 30% and the memory usage is reduced by 50% while achieving equivalent or beyond prediction accuracy. (2) Dilated causal convolution: this method is to modify the distilling operation proposed by Informer through replacing canonical convolutional layers with dilated causal convolutional layers to gain exponentially receptive field growth. (3) Passthrough mechanism: the application of passthrough mechanism to stack of self-attention blocks helps Transformer-like models get more fine-grained information with negligible extra computation costs. Our experiments on real-world datasets show that our TCCT architectures could greatly improve the performance of existing state-of-the-art Transformer models on time series forecasting with much lower computation and memory costs, including canonical Transformer, LogTrans and Informer. (c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

引用

页码：131 / 145

页数：15

共 43 条

[1] Stock Price Prediction Using the ARIMA Model
Adebiyi, Ayodele A.
Adewumi, Aderemi O.
Ayo, Charles K.
[J]. 2014 UKSIM-AMSS 16TH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION (UKSIM), 2014, : 106 - 112
[2] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[3] Bai S., 2018, Convolutional sequence modeling revisited
[4] Bashar MK, 2021, CS IT C P CS IT C P, V11
[5] Bochkovskiy A., 2020, ARXIV PREPRINT ARXIV
[6] Box G.E., 2015, TIME SERIES ANAL FOR
[7] BOX GEP, 1968, ROY STAT SOC C-APP, V17, P91
[8] Chapados N, 2014, PR MACH LEARN RES, V32, P1395
[9] Dai ZH, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2978
[10] Dong Yihe., 2021, PROC INT C MACH LEAR, P2793

← 1 2 3 4 5 →