Stabilized Nearest Neighbor Classifier and its Statistical Properties

被引:12
作者
Sun, Will Wei [1 ,2 ]
Qiao, Xingye [3 ]
Cheng, Guang [4 ]
机构
[1] Yahoo Labs, Sunnyvale, CA USA
[2] Univ Miami, Sch Business Adm, Dept Management Sci, Miami, FL USA
[3] SUNY Binghamton, Dept Math Sci, Binghamton, NY 13901 USA
[4] Purdue Univ, Dept Stat, W Lafayette, IN 47906 USA
关键词
Bayes risk; Classification; Margin condition; Minimax optimality; Reproducibility; Stability; CONSISTENCY; CONVERGENCE;
D O I
10.1080/01621459.2015.1089772
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The stability of statistical analysis is an important indicator for reproducibility, which is one main principle of the scientific method. It entails that similar statistical conclusions can be reached based on independent samples from the same underlying population. In this article, we introduce a general measure of classification instability (CIS) to quantify the sampling variability of the prediction made by a classification method. Interestingly, the asymptotic CIS of any weighted nearest neighbor classifier turns out to be proportional to the Euclidean norm of its weight vector. Based on this concise form, we propose a stabilized nearest neighbor (SNN) classifier, which distinguishes itself from other nearest neighbor classifiers, by taking the stability into consideration. In theory, we prove that SNN attains the minimax optimal convergence rate in risk, and a sharp convergence rate in CIS. The latter rate result is established for general plug-in classifiers under a low-noise condition. Extensive simulated and real examples demonstrate that SNN achieves a considerable improvement in CIS over existing nearest neighbor classifiers, with comparable classification accuracy. We implement the algorithm in a publicly available R package snn. Supplementary materials for this article are available online.
引用
收藏
页码:1254 / 1265
页数:12
相关论文
共 31 条
  • [1] Adomavicius G., 2010, ACM C REC SYST, P47
  • [2] [Anonymous], 2014, Implementing Reproducible Research
  • [3] Fast learning rates for plug-in classifiers
    Audibert, Jean-Yves
    Tsybakov, Alexandre B.
    [J]. ANNALS OF STATISTICS, 2007, 35 (02) : 608 - 633
  • [4] Bache K., 2013, UCI Machine Learning Repository
  • [5] Ben-Hur Asa, 2002, Pac Symp Biocomput, P6
  • [6] Biau G, 2010, J MACH LEARN RES, V11, P687
  • [7] Stability and generalization
    Bousquet, O
    Elisseeff, A
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (03) : 499 - 526
  • [8] Breiman L, 1996, ANN STAT, V24, P2350
  • [9] Bühlmann P, 2002, ANN STAT, V30, P927
  • [10] NEAREST NEIGHBOR PATTERN CLASSIFICATION
    COVER, TM
    HART, PE
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) : 21 - +