Robust Incremental Outlier Detection Approach Based on a New Metric in Data Streams

被引:13
作者
Degirmenci, Ali [1 ]
Karal, Omer [1 ]
机构
[1] Ankara Beyazit Univ AYBU, Dept Elect & Elect Engn, TR-06010 Ankara, Turkey
关键词
Anomaly detection; Measurement; Real-time systems; Labeling; Three-dimensional displays; Memory management; Licenses; Incremental learning; local outlier factor (LOF); new metric; outlier detection; robustness;
D O I
10.1109/ACCESS.2021.3131402
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Detecting outliers in real time from multivariate streaming data is a vital and challenging research topic in many areas. Recently introduced the incremental Local Outlier Factor (iLOF) approach and its variants have received considerable attention as they achieve high detection performance in data streams with varying distributions. However, these iLOF-based approaches still have some major limitations: i) Poor detection in high-dimensional data; ii) The difficulty of determining the proper nearest neighbor number k; iii) Instead of labeling the outlier, assigning a score to each sample that indicates the probability to be an outlier; iv) Inability to detect a long sequence (small cluster) of outliers. This article proposes a new robust outlier detection method (RiLOF) based on iLOF that can effectively overcome these limitations. In the RiLOF method, a novel metric called Median of Nearest Neighborhood Absolute Deviation (MoNNAD) has been developed that uses the median of the local absolute deviation of the samples LOF values. Unlike the previously reported LOF-based approaches, RiLOF is capable of achieving outlier detection in different data stream applications using the same hyperparameters. Extensive experiments performed on 15 different real-world data sets demonstrate that RiLOF remarkably outperforms 12 different state-of-the-art competitors.
引用
收藏
页码:160347 / 160360
页数:14
相关论文
共 46 条
[1]  
[Anonymous], 2009, P ACM S APPL COMP SA
[2]  
[Anonymous], 2009, P 18 ACM C INF KNOWL
[3]   An efficient algorithm for distributed density-based outlier detection on big data [J].
Bai, Mei ;
Wang, Xite ;
Xin, Junchang ;
Wang, Guoren .
NEUROCOMPUTING, 2016, 181 :19-28
[4]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[5]   Adaptive Threshold for Outlier Detection on Data Streams [J].
Clark, James P. ;
Liu, Zhen ;
Japkowicz, Nathalie .
2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, :41-49
[6]  
Cleary F., 2020, IEEE T KNOWL DATA EN, DOI [10.1109/TKDE.2020.3036524, DOI 10.1109/TKDE.2020.3036524]
[7]  
Dua D., 2017, UCI machine learning repository
[8]   An introduction to ROC analysis [J].
Fawcett, Tom .
PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874
[9]  
Hamlet C., 2017, Journal of Cyber Security Technology, V1, P75
[10]   Extended Isolation Forest [J].
Hariri, Sahand ;
Kind, Matias Carrasco ;
Brunner, Robert J. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (04) :1479-1489