A probabilistic nearest neighbour method for statistical pattern recognition

被引:102
作者
Holmes, CC [1 ]
Adams, NM [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Math, London SW7 2BZ, England
关键词
Bayesian k nearest neighbour; nonparametric classification; probabilistic nearest neighbour;
D O I
10.1111/1467-9868.00338
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Nearest neighbour algorithms are among the most popular methods used in statistical pattern recognition. The models are conceptually simple and empirical studies have shown that their performance is highly competitive against other techniques. However, the lack of a formal framework for choosing the size of the neighbourhood k is problematic. Furthermore, the method can only make discrete predictions by reporting the relative frequency of the classes in the neighbourhood of the prediction point. We present a probabilistic framework for the k-nearest-neighbour method that largely overcomes these difficulties. Uncertainty is accommodated via a prior distribution on k as well as in the strength of the interaction between neighbours. These prior distributions propagate uncertainty through to proper probabilistic predictions that have continuous support on (0, 1). The method makes no assumptions about the distribution of the predictor variables. The method is also fully automatic with no user-set parameters and empirically it proves to be highly accurate on many bench-mark data sets.
引用
收藏
页码:295 / 306
页数:12
相关论文
共 20 条
[1]  
[Anonymous], 1994, MACHINE LEARNING NEU
[2]   BAYESIAN IMAGE-RESTORATION, WITH 2 APPLICATIONS IN SPATIAL STATISTICS [J].
BESAG, J ;
YORK, J ;
MOLLIE, A .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1991, 43 (01) :1-20
[3]   On conditional and intrinsic autoregressions [J].
Besag, J ;
Kooperberg, C .
BIOMETRIKA, 1995, 82 (04) :733-746
[4]  
Blake C.L., 1998, UCI repository of machine learning databases
[5]  
Dasarathy B.V., 1991, IEEE COMPUTER SOC TU
[6]  
Dawid A. P., 1992, Bayesian Statistics, V4, P109
[7]  
Devroye L., 1996, A probabilistic theory of pattern recognition
[8]  
Diggle P.J., 1983, Statistical analysis of spatial point patterns
[9]   CHOICE OF THE SMOOTHING PARAMETER AND EFFICIENCY OF K-NEAREST NEIGHBOR CLASSIFICATION [J].
ENAS, GG ;
CHOI, SC .
COMPUTERS & MATHEMATICS WITH APPLICATIONS-PART A, 1986, 12 (02) :235-244
[10]  
FIX E, 1951, 4 USAF SCH AV MED, P261