Feature selection and weighting by nearest neighbor ensembles

被引:12
作者
Gertheiss, Jan [1 ]
Tutz, Gerhard [1 ]
机构
[1] Univ Munich, D-80799 Munich, Germany
关键词
Nearest neighbor methods; Variable selection; Ensemble methods; Classification; PATTERN-RECOGNITION; CLASSIFICATION; DISCRIMINATION;
D O I
10.1016/j.chemolab.2009.07.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of statistical discrimination nearest neighbor methods are a well known, quite simple but successful nonparametric classification tool If the number of predictors increases, however, predictive power normally deteriorates. In general. if some covariates are assumed to be noise variables. variable selection is a promising approach. The paper's main focus is on the development and evaluation of a nearest neighbor ensemble with implicit variable selection. In contrast to other nearest neighbor approaches we are not primarily interested in classification, but in estimating the (posterior) class probabilities. In simulation studies and for real world data the proposed nearest neighbor ensemble is compared to an extended forward/backward variable selection procedure for nearest neighbor classifiers, and some alternative well established classification tools (that offer probability estimates as well). Despite its simple structure, the proposed method's performance is quite good - especially if relevant covariates can be separated from noise variables. Another advantage of the presented ensemble is the easy identification of interactions that are usually hard to detect. So not simply variable selection but rather some kind of feature selection is performed. (C) 2009 Elsevier B.V. All rights reserved
引用
收藏
页码:30 / 38
页数:9
相关论文
共 36 条
[1]  
[Anonymous], 2009, R: A Language and Environment for Statistical Computing
[2]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]  
Brier Glenn W, 1950, Monthly weather review, V78, P1, DOI [DOI 10.1175/1520-0493(1950)078, 10.1175/1520-0493(1950)078<0001:vofeit>2.0.co
[5]  
2, DOI 10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO
[6]  
2, 10.1175/1520-0493(1950)078()0001:VOFEIT()2.0.CO
[7]  
2, DOI 10.1175/1520-0493(1950)0782.0.CO
[8]  
2]
[9]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[10]   Nearest neighbor ensemble [J].
Domeniconi, C ;
Yan, B .
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, 2004, :228-231