The mRMR variable selection method: a comparative study for functional data

被引:40
作者
Berrendero, J. R. [1 ]
Cuevas, A. [1 ]
Torrecilla, J. L. [1 ]
机构
[1] Univ Autonoma Madrid, Dept Matemat, Madrid, Spain
关键词
functional data analysis; supervised classification; distance correlation; variable selection; MUTUAL INFORMATION; RELEVANCE;
D O I
10.1080/00949655.2015.1042378
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The use of variable selection methods is particularly appealing in statistical problems with functional data. The obvious general criterion for variable selection is to choose the most representative' or most relevant' variables. However, it is also clear that a purely relevance-oriented criterion could lead to select many redundant variables. The minimum Redundance Maximum Relevance (mRMR) procedure, proposed by Ding and Peng [Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol. 2005;3:185-205] and Peng et al. [Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27:1226-1238] is an algorithm to systematically perform variable selection, achieving a reasonable trade-off between relevance and redundancy. In its original form, this procedure is based on the use of the so-calledmutual information criterion to assess relevance and redundancy. Keeping the focus on functional data problems, we propose here a modified version of the mRMR method, obtained by replacing the mutual information by the new association measure (called distance correlation) suggested by Szekely et al. [Measuring and testing dependence by correlation of distances. Ann Statist. 2007;35:2769-2794]. We have also performed an extensive simulation study, including 1600 functional experiments and three real-data examples aimed at comparing the different versions of the mRMR methodology. The results are quite conclusive in favour of the new proposed alternative.
引用
收藏
页码:891 / 907
页数:17
相关论文
共 24 条
[1]   Supervised Classification for a Family of Gaussian Functional Models [J].
Baillo, Amparo ;
Cuevas, Antonio ;
Antonio Cuesta-Albertos, Juan .
SCANDINAVIAN JOURNAL OF STATISTICS, 2011, 38 (03) :480-498
[2]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[3]   A COMPARATIVE-STUDY OF SEVERAL SMOOTHING METHODS IN DENSITY-ESTIMATION [J].
CAO, R ;
CUEVAS, A ;
MANTEIGA, WG .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1994, 17 (02) :153-176
[4]   Impact of correlation on predictive ability of biomarkers [J].
Demler, Olga V. ;
Pencina, Michael J. ;
D'Agostino, Ralph B., Sr. .
STATISTICS IN MEDICINE, 2013, 32 (24) :4196-4210
[5]  
Ding Chris, 2005, Journal of Bioinformatics and Computational Biology, V3, P185, DOI 10.1142/S0219720005001004
[6]   Normalized Mutual Information Feature Selection [J].
Estevez, Pablo. A. ;
Tesmer, Michel ;
Perez, Claudio A. ;
Zurada, Jacek A. .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (02) :189-201
[7]  
Fan RE, 2008, J MACH LEARN RES, V9, P1871
[8]  
Ferraty F., 2006, SPR S STAT
[9]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[10]  
Guyon I., 2006, Stud Fuzziness Soft Comput