kNNVWC: An Efficient k-Nearest Neighbors Approach Based on Various-Widths Clustering

被引：34

作者：

Almalawi, Abdul Mohsen ^{[1
]}

Fahad, Adil ^{[2
]}

Tari, Zahir ^{[3
]}

Cheema, Muhammad Aamir ^{[4
]}

Khalil, Ibrahim ^{[3
]}

机构：

[1] King Abdulaziz Univ, Sch Comp Sci & Informat Technol, Jeddah 21413, Saudi Arabia

[2] Al Baha Univ, Coll Comp Sci & Informat Technol, Dept Comp Sci, Al Baha, Saudi Arabia

[3] RMIT Univ, Sch Comp Sci & Informat Technol CSIT, Distributed Syst & Networking DSN Discipline, Melbourne, Vic 3000, Australia

[4] Monash Univ, Fac Informat Technol, Clayton, Vic 3168, Australia

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2016年 / 28卷 / 01期

基金：

澳大利亚研究理事会;

关键词：

Clustering; K-nearest neighbour; high dimensionality; performance; SCADA; DISTANCE-BASED OUTLIERS; ALGORITHM; SEARCH;

D O I：

10.1109/TKDE.2015.2460735

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The k-nearest neighbor approach (k-NN) has been extensively used as a powerful non-parametric technique in many scientific and engineering applications. However, this approach incurs a large computational cost. Hence, this issue has become an active research field. In this work, a novel k-NN approach based on various-widths clustering, named kNNVWC, to efficiently find k-NNs for a query object from a given data set, is presented. kNNVWC does clustering using various widths, where a data set is clustered with a global width first and each produced cluster that meets the predefined criteria is recursively clustered with its own local width that suits its distribution. This reduces the clustering time, in addition to balancing the number of produced clusters and their respective sizes. Maximum efficiency is achieved by using triangle inequality to prune unlikely clusters. Experimental results demonstrate that kNNVWC performs well in finding k-NNs for query objects compared to a number of k-NN search algorithms, especially for a data set with high dimensions, various distributions and large size.

引用

页码：68 / 81

页数：14

共 47 条

[1]

Almalawi A., 2013, C LOCAL COMPUT NETW, P639, DOI DOI 10.1109/LCN.2013.6761301

[2] An unsupervised anomaly-based detection approach for integrity attacks on SCADA systems [J].

Almalawi, Abdulmohsen ;

Yu, Xinghuo ;

Tari, Zahir ;

Fahad, Adil ;

Khalil, Ibrahim .

COMPUTERS & SECURITY, 2014, 46 :94-110

[3]

Andoni A, 2006, ANN IEEE SYMP FOUND, P459

[4] Distance-based detection and prediction of outliers [J].

Angiulli, F ;

Basta, S ;

Pizzuti, C .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (02) :145-160

[5] DOLPHIN: An Efficient Algorithm for Mining Distance-Based Outliers in Very Large Datasets [J].

Angiulli, Fabrizio ;

Fassetti, Fabio .

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (01)

[6]

[Anonymous], 2007, P 33 INT C VER LARG

[7]

Bawa M, 2005, WWW 05, DOI [DOI 10.1145/1060745.1060840, 10.1145/1060745.1060840]

[8]

Bay S.D, 2003, KDD 03, P29, DOI [10.1145/956750.956758, DOI 10.1145/956750.956758]

[9]

Beygelzimer A., 2006, ICML, DOI DOI 10.1145/1143844.1143857

[10]

Bozkaya T., 1997, SIGMOD Record, V26, P357, DOI 10.1145/253262.253345

← 1 2 3 4 5 →