An Efficient Density-Based Local Outlier Detection Approach for Scattered Data

被引:21
|
作者
Su, Shubin [2 ]
Xiao, Limin [1 ,2 ]
Ruan, Li [2 ]
Gu, Fei [2 ]
Li, Shupan [2 ]
Wang, Zhaokai [2 ]
Xu, Rongbin [2 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Outlier detection; local outlier factor; neighborhood variance; rough clustering; scattered dataset; DISTANCE-BASED OUTLIERS; ALGORITHMS;
D O I
10.1109/ACCESS.2018.2886197
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
After the local outlier factor was first proposed, there is a large family of local outlier detection approaches derived from it. Since the existing approaches only focus on the extent of overall separation between an object and its neighbors, and ignore the degree of dispersion between them, the precision of these approaches will be affected by various degrees in the scattered datasets. In addition, the outlier data occupy a relatively small amount in the dataset, but the existing approaches need to perform local outlier factor calculation on all data during the outlier detection, which greatly reduces the efficiency of the algorithms. In this paper, we redefine a local outlier factor called local deviation coefficient (LDC) by taking full advantage of the distribution of the object and its neighbors. And then, we propose a safe non-outlier objects elimination approach named as rough clustering based on multi-level queries (RCMLQ) to preprocess the datasets to eliminate the non-outlier objects to the utmost. Finally, an efficient local outlier detection approach named as efficient density-based local outlier detection for scattered data (E2DLOS) is proposed based on the LDC and RCMLQ. The RCMLQ greatly reduces the amount of data that needs to be quantified for local outlier factor and the LDC is more sensitive to the degree of anomaly of the scattered datasets, and so the E2DLOS improves the existing local outlier detection approaches in time efficiency and detection accuracy. Experiments show that the LDC can better reflect the true abnormal situations of the data for the scattered datasets. And the RCMLQ can be used in parallel with the traditional methods of improving the efficiency of the nearest neighbor search, which can further improve the efficiency of the E2DLOS algorithm by about 16%.
引用
收藏
页码:1006 / 1020
页数:15
相关论文
共 50 条
  • [31] SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets
    Nozad, Sayyed Ahmad Naghavi
    Haeri, Maryam Amir
    Folino, Gianluigi
    KNOWLEDGE-BASED SYSTEMS, 2021, 228
  • [32] Density-based Outlier Detection in Multi-dimensional Datasets
    Wang, Xite
    Cao, Zhixin
    Zhan, Rongjuan
    Bai, Mei
    Ma, Qian
    Li, Guanyu
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (12): : 3815 - 3835
  • [33] Application of density-based outlier detection to database activity monitoring
    Kim, Seung
    Cho, Nam Wook
    Lee, Young Joo
    Kang, Suk-Ho
    Kim, Taewan
    Hwang, Hyeseon
    Mun, Dongseop
    INFORMATION SYSTEMS FRONTIERS, 2013, 15 (01) : 55 - 65
  • [34] Enhancing density-based clustering: Parameter reduction and outlier detection
    Cassisi, Carmelo
    Ferro, Alfredo
    Giugno, Rosalba
    Pigola, Giuseppe
    Pulvirenti, Alfredo
    INFORMATION SYSTEMS, 2013, 38 (03) : 317 - 330
  • [35] Application of density-based outlier detection to database activity monitoring
    Seung Kim
    Nam Wook Cho
    Young Joo Lee
    Suk-Ho Kang
    Taewan Kim
    Hyeseon Hwang
    Dongseop Mun
    Information Systems Frontiers, 2013, 15 : 55 - 65
  • [36] Detection of Anomalies in Smart Meter Data: A Density-Based Approach
    Fathnia, Farid
    Fathnia, Froogh
    Javidi, Mohammad Hossein D. B.
    2017 SMART GRID CONFERENCE (SGC), 2017,
  • [37] Efficient density and cluster based incremental outlier detection in data streams
    Degirmenci, Ali
    Karal, Omer
    INFORMATION SCIENCES, 2022, 607 : 901 - 920
  • [38] Outlier detection based on local minima density
    Liu, Jia
    Wang, Guoyin
    2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2016, : 718 - 723
  • [39] A novel density-based outlier detection method using key attributes
    Qi, Zhuang
    Chen, Xiaming
    INTELLIGENT DATA ANALYSIS, 2022, 26 (06) : 1431 - 1449
  • [40] Density-Based Outlier Detection for Safeguarding Electronic patient Record Systems
    Boddy, Aaron J.
    Hurst, William
    Mackay, Michael
    El Rhalibi, Abdennour
    IEEE ACCESS, 2019, 7 : 40285 - 40294