An Efficient Density-Based Local Outlier Detection Approach for Scattered Data

被引:21
|
作者
Su, Shubin [2 ]
Xiao, Limin [1 ,2 ]
Ruan, Li [2 ]
Gu, Fei [2 ]
Li, Shupan [2 ]
Wang, Zhaokai [2 ]
Xu, Rongbin [2 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Outlier detection; local outlier factor; neighborhood variance; rough clustering; scattered dataset; DISTANCE-BASED OUTLIERS; ALGORITHMS;
D O I
10.1109/ACCESS.2018.2886197
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
After the local outlier factor was first proposed, there is a large family of local outlier detection approaches derived from it. Since the existing approaches only focus on the extent of overall separation between an object and its neighbors, and ignore the degree of dispersion between them, the precision of these approaches will be affected by various degrees in the scattered datasets. In addition, the outlier data occupy a relatively small amount in the dataset, but the existing approaches need to perform local outlier factor calculation on all data during the outlier detection, which greatly reduces the efficiency of the algorithms. In this paper, we redefine a local outlier factor called local deviation coefficient (LDC) by taking full advantage of the distribution of the object and its neighbors. And then, we propose a safe non-outlier objects elimination approach named as rough clustering based on multi-level queries (RCMLQ) to preprocess the datasets to eliminate the non-outlier objects to the utmost. Finally, an efficient local outlier detection approach named as efficient density-based local outlier detection for scattered data (E2DLOS) is proposed based on the LDC and RCMLQ. The RCMLQ greatly reduces the amount of data that needs to be quantified for local outlier factor and the LDC is more sensitive to the degree of anomaly of the scattered datasets, and so the E2DLOS improves the existing local outlier detection approaches in time efficiency and detection accuracy. Experiments show that the LDC can better reflect the true abnormal situations of the data for the scattered datasets. And the RCMLQ can be used in parallel with the traditional methods of improving the efficiency of the nearest neighbor search, which can further improve the efficiency of the E2DLOS algorithm by about 16%.
引用
收藏
页码:1006 / 1020
页数:15
相关论文
共 50 条
  • [41] Density-Based Top-k Outlier Detection on Uncertain Objects
    Fan Gaofeng
    Chen Hongmei
    OuYang Zhiping
    Wang Lizhen
    2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 2469 - 2472
  • [42] Improved approaches for density-based outlier detection in wireless sensor networks
    Aymen Abid
    Salim El Khediri
    Abdennaceur Kachouri
    Computing, 2021, 103 : 2275 - 2292
  • [43] Improved approaches for density-based outlier detection in wireless sensor networks
    Abid, Aymen
    Khediri, Salim El
    Kachouri, Abdennaceur
    COMPUTING, 2021, 103 (10) : 2275 - 2292
  • [44] A Fast and Efficient Local Outlier Detection in Data Streams
    Yang, Xing
    Zhou, Wenli
    Shu, Nanfei
    Zhang, Hao
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 111 - 116
  • [45] An Efficient Density-Based Algorithm for Data Clustering
    Theljani, Foued
    Laabidi, Kaouther
    Zidi, Salah
    Ksouri, Moufida
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2017, 26 (04)
  • [46] Efficient Distributed Approach for Density-Based Clustering
    Laloux, Jean-Francois
    Le-Khac, Nhien-An
    Kechadi, M-Tahar
    2011 20TH IEEE INTERNATIONAL WORKSHOPS ON ENABLING TECHNOLOGIES: INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2011, : 145 - 150
  • [47] Outlier Detection Based on Local Density of Vector Dot Product in Data Stream
    Shou, Zhaoyu
    Zou, Fengbo
    Tian, Hao
    Li, Simin
    SECURITY WITH INTELLIGENT COMPUTING AND BIG-DATA SERVICES, 2020, 895 : 170 - 184
  • [48] A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space
    Rehman, Mujeeb Ur
    Khan, Dost Muhammad
    Saher, Najia
    Shahzad, Faisal
    INFORMATION TECHNOLOGY AND CONTROL, 2021, 50 (01): : 138 - 152
  • [49] Local Density-Based Adaptive Undersampling Approach for Handling Imbalanced and Overlapped Data
    Liu Yi
    Huang Xian
    Cao Zhen
    Li Honglu
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE, CCAI 2024, 2024, : 263 - 268
  • [50] Enhancing Insider Threat Detection in Imbalanced Cybersecurity Settings Using the Density-Based Local Outlier Factor Algorithm
    Al-Shehari, Taher Ali
    Rosaci, Domenico
    Al-Razgan, Muna
    Alfakih, Taha
    Kadrie, Mohammed
    Afzal, Hammad
    Nawaz, Raheel
    IEEE ACCESS, 2024, 12 : 34820 - 34834