kNNVWC: An Efficient k-Nearest Neighbors Approach Based on Various-Widths Clustering

被引:33
作者
Almalawi, Abdul Mohsen [1 ]
Fahad, Adil [2 ]
Tari, Zahir [3 ]
Cheema, Muhammad Aamir [4 ]
Khalil, Ibrahim [3 ]
机构
[1] King Abdulaziz Univ, Sch Comp Sci & Informat Technol, Jeddah 21413, Saudi Arabia
[2] Al Baha Univ, Coll Comp Sci & Informat Technol, Dept Comp Sci, Al Baha, Saudi Arabia
[3] RMIT Univ, Sch Comp Sci & Informat Technol CSIT, Distributed Syst & Networking DSN Discipline, Melbourne, Vic 3000, Australia
[4] Monash Univ, Fac Informat Technol, Clayton, Vic 3168, Australia
基金
澳大利亚研究理事会;
关键词
Clustering; K-nearest neighbour; high dimensionality; performance; SCADA; DISTANCE-BASED OUTLIERS; ALGORITHM; SEARCH;
D O I
10.1109/TKDE.2015.2460735
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k-nearest neighbor approach (k-NN) has been extensively used as a powerful non-parametric technique in many scientific and engineering applications. However, this approach incurs a large computational cost. Hence, this issue has become an active research field. In this work, a novel k-NN approach based on various-widths clustering, named kNNVWC, to efficiently find k-NNs for a query object from a given data set, is presented. kNNVWC does clustering using various widths, where a data set is clustered with a global width first and each produced cluster that meets the predefined criteria is recursively clustered with its own local width that suits its distribution. This reduces the clustering time, in addition to balancing the number of produced clusters and their respective sizes. Maximum efficiency is achieved by using triangle inequality to prune unlikely clusters. Experimental results demonstrate that kNNVWC performs well in finding k-NNs for query objects compared to a number of k-NN search algorithms, especially for a data set with high dimensions, various distributions and large size.
引用
收藏
页码:68 / 81
页数:14
相关论文
共 47 条
  • [31] A simple algorithm for nearest neighbor search in high dimensions
    Nene, SA
    Nayar, SK
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (09) : 989 - 1003
  • [32] Oldmeadow J, 2004, LECT NOTES ARTIF INT, V3056, P255
  • [33] Omohundro Stephen M, 1989, Five Balltree Construction Algorithms.
  • [34] Padraig C., 2007, Mult. Class. Syst., V34, P1, DOI [DOI 10.48550/ARXIV.2004.04523, DOI 10.1145/3459665]
  • [35] Portnoy L., 2001, Proc. ACM CSS Workshop on Data Mining Applied to Security (DMSA), P5
  • [36] Prerau M.J., 2000, THESIS
  • [37] On the calibration of sensor arrays for pattern recognition using the minimal number of experiments
    Rodriguez-Lujan, Irene
    Fonollosa, Jordi
    Vergara, Alexander
    Homer, Margie
    Huerta, Ramon
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2014, 130 : 123 - 134
  • [38] Roussopoulos N., 1995, ACM SIGMOD RECORD, V24, P7179
  • [39] Shakhnarovich G., 2008, IEEE Trans. Neural Networks, V19, P377
  • [40] Power Transformer Fault Classification Based on Dissolved Gas Analysis by Implementing Bootstrap and Genetic Programming
    Shintemirov, A.
    Tang, W.
    Wu, Q. H.
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2009, 39 (01): : 69 - 79