A novel ensemble machine learning for robust microarray data classification

被引:92
作者
Peng, Yonghong [1 ]
机构
[1] Univ Bradford, Dept Comp, Bradford BD7 1DP, W Yorkshire, England
关键词
microarray data; machine learning; ensemble learning; classification;
D O I
10.1016/j.compbiomed.2005.04.001
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Microarray data analysis and classification has demonstrated convincingly that it provides an effective methodology for the effective diagnosis of diseases and cancers. Although much research has been performed on applying machine learning techniques for microarray data classification during the past years, it has been shown that conventional machine learning techniques have intrinsic drawbacks in achieving accurate and robust classifications. This paper presents a novel ensemble machine learning approach for the development of robust microarray data classification. Different from the conventional ensemble learning techniques, the approach presented begins with generating a pool of candidate base classifiers based on the gene sub-sampling and then the selection of a sub-set of appropriate base classifiers to construct the classification committee based on classifier clustering. Experimental results have demonstrated that the classifiers constructed by the proposed method outperforms not only the classifiers generated by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods (bagging and boosting). (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:553 / 573
页数:21
相关论文
共 37 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[3]  
BERRAR D, 2003, P CAMDA2003
[4]   Gene selection for cancer classification using wrapper approaches [J].
Blanco, R ;
Larrañaga, P ;
Inza, I ;
Sierra, B .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2004, 18 (08) :1373-1390
[5]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[6]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[7]   RECOGNITION OF HANDWRITTEN NUMERALS WITH MULTIPLE FEATURE AND MULTISTAGE CLASSIFIER [J].
CAO, J ;
AHMADI, M ;
SHRIDHAR, M .
PATTERN RECOGNITION, 1995, 28 (02) :153-160
[8]  
Cho S., 2003, P 1 AS PAC BIOINF C
[9]  
CHO SB, 2003, CAMDA 2003 C
[10]   Identifying and quantifying sources of variation in microarray data using high-density cDNA membrane arrays [J].
Coombes, KR ;
Highsmith, WE ;
Krogmann, TA ;
Baggerly, KA ;
Stivers, DN ;
Abruzzo, LV .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (04) :655-669