Time-series anomaly detection with stacked Transformer representations and 1D convolutional network

被引：93

作者：

Kim, Jina ^{[1
,2
]}

Kang, Hyeongwon ^{[1
]}

Kang, Pilsung ^{[1
]}

机构：

[1] Korea Univ, Sch Ind & Management Engn, 145 Anam Ro, Seoul 02841, South Korea

[2] Shinhan Bank, Seoul, South Korea

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2023年 / 120卷

基金：

新加坡国家研究基金会;

关键词：

Time series anomaly detection; Transformer; Convolution Neural Network;

D O I：

10.1016/j.engappai.2023.105964

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Time-series anomaly detection is a task of detecting data that do not follow normal data distribution among continuously collected data. It is used for system maintenance in various industries; hence, studies on time -series anomaly detection are being carried out actively. Most of the methodologies are based on Long Short-Term Memory (LSTM) and Convolution Neural Network (CNN) to model the temporal structure of time-series data. In this study, we propose an unsupervised prediction-based time-series anomaly detection methodology using Transformer, which shows superior performance to LSTM and CNN in learning dynamic patterns of sequential data through a self-attention mechanism. The prediction model consists of an encoder comprising multiple Transformer encoder layers and a decoder that includes a 1D convolution layer. The output representation of each Transformer layer is accumulated in the encoder to obtain a representation with multi-level, rich information. The decoder fuses this representation through a 1d convolution operation. Consequently, the model can perform predictions considering both the global trend and local variability of the input time-series. The anomaly score is defined as the difference between the predicted and the actual value at the corresponding timestamp, assuming that the trained model produces the predictions that follow the normal data distribution. Finally, the data with an anomaly score above the threshold is detected as an anomaly. Experiments on the benchmark datasets show that the proposed method has performance superior to those of the baselines.

引用

页数：12

共 31 条

[1]

Huang CZA, 2018, Arxiv, DOI [arXiv:1809.04281, 10.48550/arXiv.1809.04281]

[2]

Braei M, 2020, Arxiv, DOI arXiv:2004.00433

[3] LOF: Identifying density-based local outliers [J].

Breunig, MM ;

Kriegel, HP ;

Ng, RT ;

Sander, J .

SIGMOD RECORD, 2000, 29 (02) :93-104

[4]

Dosovitskiy Alexey, 2021, P ICLR

[5]

Fang ML, 2020, PROCEEDINGS OF 2020 23RD INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2020), P233

[6]

Gao Jing, 2020, arXiv

[7] TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks [J].

Geiger, Alexander ;

Liu, Dongyu ;

Alnegheimish, Sarah ;

Cuesta-Infante, Alfredo ;

Veeramachaneni, Kalyan .

2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, :33-43

[8] Conformer: Convolution-augmented Transformer for Speech Recognition [J].

Gulati, Anmol ;

Qin, James ;

Chiu, Chung-Cheng ;

Parmar, Niki ;

Zhang, Yu ;

Yu, Jiahui ;

Han, Wei ;

Wang, Shibo ;

Zhang, Zhengdong ;

Wu, Yonghui ;

Pang, Ruoming .

INTERSPEECH 2020, 2020, :5036-5040

[9]

Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]

[10]

Lai K.-H., 2021, 35 C NEURAL INFORM P

← 1 2 3 4 →