Experimental Analysis of Feature Selection Stability for High-Dimension and Low-Sample Size Gene Expression Classification Task

被引:0
作者
Dernoncourt, David [1 ,2 ]
Hanczar, Blaise [3 ]
Zucker, Jean-Daniel [1 ,2 ,3 ,4 ]
机构
[1] Ctr Rech Cordeliers, Inst Natl Sante & Rech Med, U872, Equipe 7, F-75006 Paris, France
[2] Univ Paris 06, F-75006 Paris, France
[3] Univ Paris 05, LIPADE, F-75006 Paris, France
[4] UMMISCO, IRD, UMI 209, F-93143 Bondy, France
来源
IEEE 12TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS & BIOENGINEERING | 2012年
关键词
Feature selection; small sample; stability; dimensionality/sample size ratio; MICROARRAY; SURVIVAL; CANCER;
D O I
暂无
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Gene selection is a crucial step when building a classifier from microarray or metagenomic data. As the number of observations is small, the gene selection tends to be unstable. It is common that two gene subsets, obtained from different datasets but dealing with the same classification problem, do not overlap significantly. Although it is a crucial problem, few works have been done on the selection stability. In this paper, we first present some stability quantification methods, then we study the variations of those measures with various parameters (dimensionality, sample size, feature distribution, selection threshold) on both artificial and real data, as well as the resulting classification performance. Feature selection was performed with t-test and classification with linear discriminant analysis. We point out a strong empiric correlation between the dimensionality/sample size ratio and selection instability.
引用
收藏
页码:350 / 355
页数:6
相关论文
共 18 条
[1]  
[Anonymous], 2007, ARTIFICIAL INTELLIGE
[2]   Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].
Bhattacharjee, A ;
Richards, WG ;
Staunton, J ;
Li, C ;
Monti, S ;
Vasa, P ;
Ladd, C ;
Beheshti, J ;
Bueno, R ;
Gillette, M ;
Loda, M ;
Weber, G ;
Mark, EJ ;
Lander, ES ;
Wong, W ;
Johnson, BE ;
Golub, TR ;
Sugarbaker, DJ ;
Meyerson, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13790-13795
[3]   Microarray-Based Gene Expression Studies in Ovarian Cancer [J].
Chon, Hye Sook ;
Lancaster, Johnathan M. .
CANCER CONTROL, 2011, 18 (01) :8-15
[4]   Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer [J].
Ein-Dor, L ;
Zuk, O ;
Domany, E .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (15) :5923-5928
[5]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[6]  
Haury A.-C., 2010, MLSB, P27
[7]   The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures [J].
Haury, Anne-Claire ;
Gestraud, Pierre ;
Vert, Jean-Philippe .
PLOS ONE, 2011, 6 (12)
[8]  
Jain A. K., 1982, Handbook of Statistics, V2, P835, DOI [DOI 10.1016/S0169-7161, 10.1016/S0169-7161]
[9]   Stability of feature selection algorithms [J].
Kalousis, A ;
Prados, J ;
Hilario, M .
FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, :218-225
[10]  
Krízek P, 2007, LECT NOTES COMPUT SC, V4673, P929