A novel version of k nearest neighbor: Dependent nearest neighbor

被引:63
作者
Ertugrul, Omer Faruk [1 ]
Tagluk, Mehmet Emin [2 ]
机构
[1] Batman Univ, Dept Elect & Elect Engn, TR-72100 Batman, Turkey
[2] Inonu Univ, Dept Elect & Elect Engn, TR-44010 Malatya, Turkey
关键词
Dependency; Similarity; k nearest neighbor; Dependent nearest neighbor; CLASSIFICATION; CLASSIFIERS; RULE;
D O I
10.1016/j.asoc.2017.02.020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
k nearest neighbor (kNN) is one of the basic processes behind various machine learning methods In kNN, the relation of a query to a neighboring sample is basically measured by a similarity metric, such as Euclidean distance. This process starts with mapping the training dataset onto a one-dimensional distance space based on the calculated similarities, and then labeling the query in accordance with the most dominant or mean of the labels of the k nearest neighbors, in classification or regression issues, respectively. The number of nearest neighbors (k) is chosen according to the desired limit of success. Nonetheless, two distinct samples may have equal distances to query but, with different angles in the feature space. The similarity of the query to these two samples needs to be weighted in accordance with the angle going between the query and each of the samples to differentiate between the two distances in reference to angular information. This opinion can be analyzed in the context of dependency and can be utilized to increase the precision of classifier. With this point of view, instead of kNN, the query is labeled according to its nearest dependent neighbors that are determined by a joint function, which is built on the similarity and the dependency. This method, therefore, may be called dependent NN (d-NN). To demonstrate d-NN, it is applied to synthetic datasets, which have different statistical distributions, and 4 benchmark datasets, which are Pima Indian, Hepatitis, approximate Sinc and CASP datasets. Results showed the superiority of d-NN in terms of accuracy and computation cost as compared to other employed popular machine learning methods. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:480 / 490
页数:11
相关论文
共 39 条
[1]  
[Anonymous], 2006, P IEEE C COMPUTER VI, DOI DOI 10.1109/CVPR.2006.301
[2]  
[Anonymous], 2004, DISCUSSION PAPER 399
[3]  
Bache K., 2013, UCI Machine Learning Repository
[4]  
Banerjee S., 2014, LINEAR ALGEBRA AND M, P181
[5]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[6]  
Boriah S., 2008, P 8 SIAM INT C DAT M, P243, DOI DOI 10.1137/1.9781611972788.22
[7]  
Cha S. -H., 2007, INT J MATH MODELS ME, V1, P300
[8]  
Chernoff Konstantin, 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P666, DOI 10.1109/ICPR.2010.168
[9]   The training of neural classifiers with condensed datasets [J].
Choi, SH ;
Rockett, P .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2002, 32 (02) :202-206
[10]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+