Scalable Outlier Detection Using Distance Projections

被引:0
作者
Cao, Jin [1 ]
Hu, Rui [2 ]
机构
[1] Nokia Bell Labs, Murray Hill, NJ 07974 USA
[2] Univ Calif Davis, Davis, CA 95616 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2021年
关键词
outlier detection; local outlier factor; random projection; scalability; nearest neighbour;
D O I
10.1109/BigData52589.2021.9671856
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data exploration is a critical component in data science. However, this has become increasingly more difficult for a large, high-dimensional dataset that contains dirty records. In this paper, we propose a solution for detecting outliers in large high-dimensional datasets using distance-based projections. Our proposed solution first maps the high-dimensional data to multiple one dimension values using their distances to a random set of reference points. For each projected value of a data point, we compute a local outlier factor score and then obtain a combined score that can be used to detect outliers. Our solution is computationally much cheaper than traditional methods. We demonstrate the effectiveness of our solutions using both simulation and real data studies.
引用
收藏
页码:4431 / 4440
页数:10
相关论文
共 31 条
[1]  
Aggarwal C. C., 2001, SIGMOD Record, V30, P37, DOI 10.1145/376284.375668
[2]   A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams [J].
Alghushairy, Omar ;
Alsini, Raed ;
Soule, Terence ;
Ma, Xiaogang .
BIG DATA AND COGNITIVE COMPUTING, 2021, 5 (01) :1-24
[3]  
Alsini R., 2021, ADV ARTIFICIAL INTEL, V2021, P1047
[4]  
[Anonymous], 2010, Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU'10, DOI DOI 10.1145/1735688.1735707
[5]  
Boukerche A, 2020, ACM COMPUT SURV, V53, DOI [10.1145/3381028, 10.1145/3421763]
[6]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[7]   On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study [J].
Campos, Guilherme O. ;
Zimek, Arthur ;
Sander, Jorg ;
Campello, Ricardo J. G. B. ;
Micenkova, Barbora ;
Schubert, Erich ;
Assent, Ira ;
Houle, Michael E. .
DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (04) :891-927
[8]  
Caruana R., 2015, INVITED TALK OPEN RE
[9]  
Dasgupta S., 2013, ARXIV13013849
[10]  
Goldstein M, 2012, INT C PATT RECOG, P2282