On optimum choice of k in nearest neighbor classification

被引:86
作者
Ghosh, Anil K. [1 ]
机构
[1] Indian Stat Inst, Theoret Stat & Math Unit, Kolkata 700108, India
关键词
accuracy index; Bayesian strength function; cross-validation; misclassification rate; neighborhood parameter; non-informative prior; optimal Bayes risk; posterior probability;
D O I
10.1016/j.csda.2005.06.007
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A major issue in k-nearest neighbor classification is how to choose the optimum value of the neighborhood parameter k. Popular cross-validation techniques often fail to guide us well in selecting k mainly due to the presence of multiple minimizers of the estimated misclassification rate. This article investigates a Bayesian method in this connection, which solves the problem of multiple optimizers. The utility of the proposed method is illustrated using some benchmark data sets. (C) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:3113 / 3123
页数:11
相关论文
共 24 条
  • [1] Aho A.V., 1974, The Design and Analysis of Computer Algorithms
  • [2] Anderson TW., 1984, INTRO MULTIVARIATE S
  • [3] NEAREST NEIGHBOR PATTERN CLASSIFICATION
    COVER, TM
    HART, PE
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) : 21 - +
  • [4] Dasarathy B.V., 1991, IEEE COMPUTER SOC TU
  • [5] Duda R. O., 2000, PATTERN CLASSIFICATI
  • [6] Fix E., 1951, TECHNICAL REPORT REP, P261
  • [7] Friedman J., 2001, ELEMENTS STAT LEARNI, V1
  • [8] Friedman J., 1996, Another approach to polychotomous classification
  • [9] Friedman JeromeH., 1994, FLEXIBLE METRIC NEAR
  • [10] Ghosh AK, 2004, STAT SINICA, V14, P457