Resampling methods for quality assessment of classifier performance and optimal number of features

被引:5
作者
Fandos, Raquel [1 ]
Debes, Christian [2 ]
Zoubir, Abdelhak M. [1 ]
机构
[1] Tech Univ Darmstadt, Inst Telecommun, Signal Proc Grp, D-64283 Darmstadt, Germany
[2] AGT Grp R&D GmbH, D-64295 Darmstadt, Germany
关键词
Pattern recognition; Resampling; Bootstrap; Classifier design and evaluation; Feature evaluation and selection; Optimal dimensionality; FEATURE-SELECTION; CROSS-VALIDATION; NEURAL-NETWORK; BOOTSTRAP; PROBABILITY; RECOGNITION; PREDICTION;
D O I
10.1016/j.sigpro.2013.05.004
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We address two fundamental design issues of a classification system: the choice of the classifier and the dimensionality of the optimal feature subset. Resampling techniques are applied to estimate both the probability distribution of the misclassification rate (or any other figure of merit of a classifier) subject to the size of the feature set, and the probability distribution of the optimal dimensionality given a classification system and a misclassification rate. The latter allows for the estimation of confidence intervals for the optimal feature set size. Based on the former, a quality assessment for the classifier performance is proposed. Traditionally, the comparison of classification systems is accomplished for a fixed feature set. However, a different set may provide different results. The proposed method compares the classifiers independently of any pre-selected feature set. The algorithms are tested on 80 sets of synthetic examples and six standard databases of real data. The simulated data results are verified by an exhaustive search of the optimum and by two feature selection algorithms for the real data sets. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:2956 / 2968
页数:13
相关论文
共 57 条
[1]  
Abramowitz M., 1972, Handbook on Mathematical Functions with Formulas, Graphs, and Mathematical Tables
[2]  
Anderson T. W., 2003, An Introduction to Multivariate Statistical Analysis, V3rd
[3]   A three-step method for choosing the number of bootstrap repetitions [J].
Andrews, DWK ;
Buchinsky, M .
ECONOMETRICA, 2000, 68 (01) :23-51
[4]  
[Anonymous], 2012, The Jackknife and Bootstrap
[5]  
[Anonymous], 1943, Bull Calcutta Math Soc, DOI DOI 10.1038/157869B0
[6]  
[Anonymous], 2006, Pattern recognition and machine learning
[7]  
Bellman R., 1961, Adaptive Control Processes: A Guided Tour, DOI DOI 10.1515/9781400874668
[8]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[9]   Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods [J].
Borra, Simone ;
Di Ciaccio, Agostino .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (12) :2976-2989
[10]   ESTIMATING PROBABILITY OF MISCLASSIFICATION AND VARIATE SELECTION [J].
BOULLION, TL ;
ODELL, PL ;
DURAN, BS .
PATTERN RECOGNITION, 1975, 7 (03) :139-145