Anomaly detection in streaming data: A comparison and evaluation study

被引:13
作者
Vazquez, Felix Iglesias [1 ]
Hartl, Alexander [1 ]
Zseby, Tanja [1 ]
Zimek, Arthur [2 ]
机构
[1] TU Wien, Gusshausstr 25-E389, A-1040 Vienna, Austria
[2] Univ Southern Denmark SDU, Campusvej 55, DK-5230 Odense M, Denmark
关键词
Anomaly detection; Outlier detection; Streaming data; Concept drift; OUTLIER DETECTION; ALGORITHMS;
D O I
10.1016/j.eswa.2023.120994
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The detection of anomalies in streaming data faces complexities that make traditional static methods unsuitable due to computational costs and nonstationarity. We test and evaluate eight state of the art algorithms against prominent challenges related to streaming data. Results show insights regarding accuracy, memory-dependency, parameterization, and pre-knowledge exploitation, thus revealing the high impact of some data characteristics to establish a most appropriate algorithm, namely: locality (i.e., whether outlierness is relative to local contexts), relativeness (i.e., if past data defines outlierness), and concept drift (if it is expected, its intensity and frequency). In most applied cases, such factors can be inferred in advance through the use of historical data and domain knowledge. Assuming the viability of the studied methods in terms of time efficiency, this work discloses key findings to achieve optimal designs of streaming data anomaly detection in real-life applications.
引用
收藏
页数:20
相关论文
共 59 条
[1]  
Ahmadzadeh A., 2020, Multivariate Timeseries Feature Extraction on SWAN Data Benchmark (SWAN Features)
[2]  
Angiulli F., 2007, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, P811, DOI [10.1145/1321440.1321552, DOI 10.1145/1321440.1321552]
[3]  
Angryk R., 2020, SWAN SF, DOI [10.7910/DVN/EBCFKM,HarvardDataverse, DOI 10.7910/DVN/EBCFKM,HARVARDDATAVERSE]
[4]   Multivariate time series dataset for space weather data analytics [J].
Angryk, Rafal A. ;
Martens, Petrus C. ;
Aydin, Berkay ;
Kempton, Dustin ;
Mahajan, Sushant S. ;
Basodi, Sunitha ;
Ahmadzadeh, Azim ;
Cai, Xumin ;
Filali Boubrahimi, Soukaina ;
Hamdi, Shah Muhammad ;
Schuh, Michael A. ;
Georgoulis, Manolis K. .
SCIENTIFIC DATA, 2020, 7 (01)
[5]   An extensive comparative study of cluster validity indices [J].
Arbelaitz, Olatz ;
Gurrutxaga, Ibai ;
Muguerza, Javier ;
Perez, Jesus M. ;
Perona, Inigo .
PATTERN RECOGNITION, 2013, 46 (01) :243-256
[6]   Streaming Data Analysis: Clustering or Classification? [J].
Bezdek, James C. ;
Keller, James M. .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (01) :91-102
[7]  
Boyd Kendrick, 2013, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2013. Proceedings: LNCS 8190, P451, DOI 10.1007/978-3-642-40994-3_29
[8]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[9]   On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study [J].
Campos, Guilherme O. ;
Zimek, Arthur ;
Sander, Jorg ;
Campello, Ricardo J. G. B. ;
Micenkova, Barbora ;
Schubert, Erich ;
Assent, Ira ;
Houle, Michael E. .
DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (04) :891-927
[10]  
CATLETT J, 1991, MACHINE LEARNING, P596