Multiclass classification of microarray data samples with a reduced number of genes

被引:10
作者
Tapia, Elizabeth [1 ,2 ]
Ornella, Leonardo [1 ]
Bulacio, Pilar [1 ,2 ]
Angelone, Laura [1 ,2 ]
机构
[1] CIFASIS Conicet Inst, Rosario, Santa Fe, Argentina
[2] Natl Univ Rosario, Fac Cs Exactas & Ingn, Rosario, Santa Fe, Argentina
关键词
FEATURE-SELECTION; CANCER; PREDICTION; RULES; BIAS;
D O I
10.1186/1471-2105-12-59
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results: A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions: A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples.
引用
收藏
页数:13
相关论文
共 54 条
[1]   Robust biomarker identification for cancer diagnosis with ensemble feature selection methods [J].
Abeel, Thomas ;
Helleputte, Thibault ;
Van de Peer, Yves ;
Dupont, Pierre ;
Saeys, Yvan .
BIOINFORMATICS, 2010, 26 (03) :392-398
[2]   Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data [J].
Aliferis, Constantin F. ;
Statnikov, Alexander ;
Tsamardinos, Ioannis ;
Schildcrout, Jonathan S. ;
Shepherd, Bryan E. ;
Harrell, Frank E., Jr. .
PLOS ONE, 2009, 4 (03)
[3]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[4]   Reducing multiclass to binary: A unifying approach for margin classifiers [J].
Allwein, EL ;
Schapire, RE ;
Singer, Y .
JOURNAL OF MACHINE LEARNING RESEARCH, 2001, 1 (02) :113-141
[5]   Selection bias in gene extraction on the basis of microarray gene-expression data [J].
Ambroise, C ;
McLachlan, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) :6562-6566
[6]  
[Anonymous], 1999, The Nature Statist. Learn. Theory
[7]  
[Anonymous], BIOMETRIKA
[8]   Genomic data sampling and its effect on classification performance assessment [J].
Azuaje, F .
BMC BIOINFORMATICS, 2003, 4 (1)
[9]  
Berger A, 1999, IJCAI 99 WORKSH MACH
[10]   Statistical modeling: The two cultures [J].
Breiman, L .
STATISTICAL SCIENCE, 2001, 16 (03) :199-215