A genetic programming-based approach to the classification of multiclass microarray datasets

被引:51
作者
Liu, Kun-Hong [1 ,2 ,4 ]
Xu, Chun-Gui [2 ,3 ]
机构
[1] Xiamen Univ, Sch Software, Xiamen 361005, Fujian, Peoples R China
[2] Chinese Acad Sci, Hefei Inst Intelligent Machines, Intelligent Comp Lab, Hefei 230031, Anhui, Peoples R China
[3] Univ Sci & Technol China, Sch Life Sci, Hefei 230026, Anhui, Peoples R China
[4] Univ Sci & Technol China, Dept Automat, Hefei 230026, Anhui, Peoples R China
基金
中国博士后科学基金; 国家高技术研究发展计划(863计划); 美国国家科学基金会;
关键词
MOLECULAR CLASSIFICATION; FEATURE-SELECTION; CANCER; PREDICTION; CLASSIFIERS; ALGORITHMS; DISCOVERY;
D O I
10.1093/bioinformatics/btn644
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Feature selection approaches have been widely applied to deal with the small sample size problem in the analysis of microarray datasets. For the multiclass problem, the proposed methods are based on the idea of selecting a gene subset to distinguish all classes. However, it will be more effective to solve a multiclass problem by splitting it into a set of two-class problems and solving each problem with a respective classification system. Results: We propose a genetic programming (GP)-based approach to analyze multiclass microarray datasets. Unlike the traditional GP, the individual proposed in this article consists of a set of small-scale ensembles, named as sub-ensemble (denoted by SE). Each SE consists of a set of trees. In application, a multiclass problem is divided into a set of two-class problems, each of which is tackled by a SE first. The SEs tackling the respective two-class problems are combined to construct a GP individual, so each individual can deal with a multiclass problem directly. Effective methods are proposed to solve the problems arising in the fusion of SEs, and a greedy algorithm is designed to keep high diversity in SEs. This GP is tested in five datasets. The results show that the proposed method effectively implements the feature selection and classification tasks.
引用
收藏
页码:331 / 337
页数:7
相关论文
共 31 条
  • [1] Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses
    Bhattacharjee, A
    Richards, WG
    Staunton, J
    Li, C
    Monti, S
    Vasa, P
    Ladd, C
    Beheshti, J
    Bueno, R
    Gillette, M
    Loda, M
    Weber, G
    Mark, EJ
    Lander, ES
    Wong, W
    Johnson, BE
    Golub, TR
    Sugarbaker, DJ
    Meyerson, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) : 13790 - 13795
  • [2] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [3] A constrained-syntax genetic programming system for discovering classification rules: application to medical data sets
    Bojarczuk, CC
    Lopes, HS
    Freitas, AA
    Michalkiewicz, EL
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2004, 30 (01) : 27 - 48
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] Learning effective classifiers with Z-value measure based on genetic programming
    Chien, BC
    Lin, JY
    Yang, WP
    [J]. PATTERN RECOGNITION, 2004, 37 (10) : 1957 - 1972
  • [6] Genetic test bed for feature selection
    Choudhary, A
    Brun, M
    Hua, JP
    Lowey, J
    Suh, E
    Dougherty, ER
    [J]. BIOINFORMATICS, 2006, 22 (07) : 837 - 842
  • [7] BagBoosting for tumor classification with gene expression data
    Dettling, M
    [J]. BIOINFORMATICS, 2004, 20 (18) : 3583 - 3593
  • [8] Delineation of prognostic biomarkers in prostate cancer
    Dhanasekaran, SM
    Barrette, TR
    Ghosh, D
    Shah, R
    Varambally, S
    Kurachi, K
    Pienta, KJ
    Rubin, MA
    Chinnaiyan, AM
    [J]. NATURE, 2001, 412 (6849) : 822 - 826
  • [9] Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
    Golub, TR
    Slonim, DK
    Tamayo, P
    Huard, C
    Gaasenbeek, M
    Mesirov, JP
    Coller, H
    Loh, ML
    Downing, JR
    Caligiuri, MA
    Bloomfield, CD
    Lander, ES
    [J]. SCIENCE, 1999, 286 (5439) : 531 - 537
  • [10] NEURAL NETWORK ENSEMBLES
    HANSEN, LK
    SALAMON, P
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (10) : 993 - 1001