Resampling methods for parameter-free and robust feature selection with mutual information

被引:101
作者
Francois, D.
Rossi, F.
Wertz, V.
Verleysen, M.
机构
[1] Catholic Univ Louvain, Machine Learning Grp, DICE, B-1348 Louvain, Belgium
[2] Catholic Univ Louvain, Machine Learning Grp, CESAME, B-1348 Louvain, Belgium
[3] INRIA, Projet AxIS, F-78153 Le Chesnay, France
关键词
mutual information; permutation test; feature selection;
D O I
10.1016/j.neucom.2006.11.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Combining the mutual information criterion with a forward feature selection strategy offers a good trade-off between optimality of the selected feature subset and computation time. However, it requires to set the parameter(s) of the mutual information estimator and to determine when to halt the forward procedure. These two choices are difficult to make because, as the dimensionality of the subset increases, the estimation of the mutual information becomes less and less reliable. This paper proposes to use resampling methods, a K-fold cross-validation and the permutation test, to address both issues. The resampling methods bring information about the variance of the estimator, information which can then be used to automatically set the parameter and to calculate a threshold to stop the forward procedure. The procedure is illustrated on a synthetic data set as well as on the real-world examples. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:1276 / 1288
页数:13
相关论文
共 35 条
[1]  
[Anonymous], MULTIVARIABLE DENSIT
[2]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[3]  
Bellmann R., 1961, Adaptive Control Processes: A Guided Tour
[4]   On the kernel widths in radial-basis function networks [J].
Benoudjit, N ;
Verleysen, M .
NEURAL PROCESSING LETTERS, 2003, 18 (02) :139-154
[5]  
BONNLANDER BV, 1994, P INT S ART NEUR NET, P42
[6]   Automatic identification of subcellular phenotypes on human cell arrays [J].
Conrad, C ;
Erfle, H ;
Warnat, P ;
Daigle, N ;
Lörch, T ;
Ellenberg, J ;
Pepperkok, R ;
Eils, R .
GENOME RESEARCH, 2004, 14 (06) :1130-1136
[7]   Exploration of statistical dependence between illness parameters using the entropy correlation coefficient [J].
Craddock, RC ;
Taylor, R ;
Broderick, G ;
Whistler, T ;
Klimas, N ;
Unger, ER .
PHARMACOGENOMICS, 2006, 7 (03) :421-428
[8]  
DIJCK GV, 2006, ICANN 2006 INT C ART
[9]  
Diks C, 2002, STUD NONLINEAR DYN E, V6
[10]  
Fleuret F, 2004, J MACH LEARN RES, V5, P1531