A novel two-level nearest neighbor classification algorithm using an adaptive distance metric

被引：13

作者：

Gao, Yunlong ^{[1
]}

Pan, Jinyan ^{[2
]}

Ji, Guoli ^{[1
]}

Yang, Zijiang ^{[3
]}

机构：

[1] Xiamen Univ, Dept Automat, Xiamen 361005, Peoples R China

[2] Jimei Univ, Coll Informat Engn, Xiamen 361021, Peoples R China

[3] York Univ, Sch Informat Technol, Toronto, ON M3J 1P3, Canada

来源：

KNOWLEDGE-BASED SYSTEMS | 2012年 / 26卷

基金：

中国国家自然科学基金; 高等学校博士学科点专项科研基金;

关键词：

Classification; Nearest neighbors; Metric learning; Mean-absolute error; AdaBoost; MARGIN;

D O I：

10.1016/j.knosys.2011.07.010

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When there exist an infinite number of samples in the training set, the outcome from nearest neighbor classification (kNN) is independent on its adopted distance metric. However, it is impossible that the number of training samples is infinite. Therefore, selecting distance metric becomes crucial in determining the performance of kNN. We propose a novel two-level nearest neighbor algorithm (TLNN) in order to minimize the mean-absolute error of the misclassification rate of kNN with finite and infinite number of training samples. At the low-level, we use Euclidean distance to determine a local subspace centered at an unlabeled test sample. At the high-level, AdaBoost is used as guidance for local information extraction. Data invariance is maintained by TLNN and the highly stretched or elongated neighborhoods along different directions are produced. The TLNN algorithm can reduce the excessive dependence on the statistical method which learns prior knowledge from the training data. Even the linear combination of a few base classifiers produced by the weak learner in AdaBoost can yield much better kNN classifiers. The experiments on both synthetic and real world data sets provide justifications for our proposed method. (C) 2011 Elsevier B.V. All rights reserved.

引用

页码：103 / 110

页数：8

共 24 条

[1]

[Anonymous], ADV NEURAL INFORM PR

[2]

[Anonymous], 1999, THESIS CAMBRIDGE U

[3]

Bar-Hillel AB, 2005, J MACH LEARN RES, V6, P937

[4]

BIAN Zhao-qi, 2000, Pattern recognition

[5]

Blake C. L., 1998, Uci repository of machine learning databases

[6] Edited AdaBoost by weighted kNN [J].

Gao, Yunlong ;

Gao, Feng .

NEUROCOMPUTING, 2010, 73 (16-18) :3079-3088

[7] A time-series modeling method based on the boosting gradient-descent theory [J].

Gao YunLong ;

Pan JinYan ;

Ji GuoLi ;

Gao Feng .

SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2011, 54 (05) :1325-1337

[8] K nearest neighbours with mutual information for simultaneous classification and missing data imputation [J].

Garcia-Laencina, Pedro J. ;

Sancho-Gomez, Jose-Luis ;

Figueiras-Vidal, Anibal R. ;

Verleysen, Michel .

NEUROCOMPUTING, 2009, 72 (7-9) :1483-1493

[9]

Goldberger J., 2004, Advances in Neural Information Processing Systems, V17

[10]

Graepel T, 2004, ADV NEUR IN, V16, P33

← 1 2 3 →