An Improved KNN Based Outlier Detection Algorithm for Large Datasets

被引:0
|
作者
Wang, Qian [1 ]
Zheng, Min [1 ]
机构
[1] Chongqing Univ, Sch Comp Sci, Chongqing 630044, Peoples R China
来源
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2010, PT I | 2010年 / 6440卷
关键词
Data mining; Knn; Outlier Detection;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection is becoming a hot issue in the field of data mining since outliers often contain useful information. In this paper, we propose an improved KNN based outlier detection algorithm which is fulfilled through two stage clustering. Clustering one is to partition the dataset into several clusters and then calculate the Kth nearest neighbor in each cluster which can effectively avoid passing the entire dataset for each calculation. Clustering two is to partition the clusters obtained by clustering one and then prune the partitions as soon as it is determined that it cannot contain outliers which results in substantial savings in computation. Experimental results on both synthetic and real life datasets demonstrate that our algorithm is efficient in large datasets.
引用
收藏
页码:585 / 592
页数:8
相关论文
共 50 条
  • [1] KNN Based Outlier Detection Algorithm in Large Dataset
    Yang, Peng
    Huang, Biao
    2008 INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND TRAINING AND 2008 INTERNATIONAL WORKSHOP ON GEOSCIENCE AND REMOTE SENSING, VOL 1, PROCEEDINGS, 2009, : 611 - 613
  • [2] Cell-based outlier detection algorithm: A fast outlier detection algorithm for large datasets
    Wan, You
    Bian, Fuling
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 1042 - 1048
  • [3] An Outlier Detection Algorithm based on KNN-kernel Density Estimation
    Wahid, Abdul
    Rao, Annavarapu Chandra Sekhara
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [4] An Improved Methodology for Outlier Detection in Dynamic Datasets
    Xu, Shu
    Baldea, Michael
    Edgar, Thomas F.
    Wojsznis, Willy
    Blevins, Terrence
    Nixon, Mark
    AICHE JOURNAL, 2015, 61 (02) : 419 - 433
  • [5] An Efficient Algorithm for Distributed Outlier Detection in Large Multi-Dimensional Datasets
    Wang, Xi-Te
    Shen, De-Rong
    Bai, Mei
    Nie, Tie-Zheng
    Kou, Yue
    Yu, Ge
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (06) : 1233 - 1248
  • [6] Optimisation of outlier data mining algorithm for large datasets based on unit
    Li Y.
    Zhou X.
    International Journal of Information Technology and Management, 2023, 22 (3-4) : 175 - 189
  • [7] An Efficient Algorithm for Distributed Outlier Detection in Large Multi-Dimensional Datasets
    Xi-Te Wang
    De-Rong Shen
    Mei Bai
    Tie-Zheng Nie
    Yue Kou
    Ge Yu
    Journal of Computer Science and Technology, 2015, 30 : 1233 - 1248
  • [8] EBOD: An ensemble-based outlier detection algorithm for noisy datasets
    Ouyang, Boya
    Song, Yu
    Li, Yuhai
    Sant, Gaurav
    Bauchy, Mathieu
    KNOWLEDGE-BASED SYSTEMS, 2021, 231
  • [9] An efficient reference-based approach to outlier detection in large datasets
    Pei, Yaling
    Zaiane, Osmar R.
    Gao, Yong
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 478 - 487
  • [10] The spatial outlier mining algorithm based on the KNN graph
    Cao, Lijun
    Liu, Xiyin
    Wang, Zhi Ping
    Zhang, Zhongping
    Journal of Software, 2013, 8 (12) : 3158 - 3165