Optimization Based Tumor Classification from Microarray Gene Expression Data

被引:56
作者
Dagliyan, Onur [1 ]
Uney-Yuksektepe, Fadime [2 ]
Kavakli, I. Halil [1 ]
Turkay, Metin [3 ]
机构
[1] Koc Univ, Dept Chem & Biol Engn, Istanbul, Turkey
[2] Istanbul Kultur Univ, Dept Ind Engn, Istanbul, Turkey
[3] Koc Univ, Dept Ind Engn, Istanbul, Turkey
来源
PLOS ONE | 2011年 / 6卷 / 02期
关键词
BAYESIAN VARIABLE SELECTION; PARTIAL LEAST-SQUARES; B-CELL LYMPHOMAS; PROSTATE-CANCER; LOGISTIC-REGRESSION; PREDICTION; LEUKEMIA; BINDING; IDENTIFICATION; ORGANIZATION;
D O I
10.1371/journal.pone.0014579
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: An important use of data obtained from microarray measurements is the classification of tumor types with respect to genes that are either up or down regulated in specific cancer types. A number of algorithms have been proposed to obtain such classifications. These algorithms usually require parameter optimization to obtain accurate results depending on the type of data. Additionally, it is highly critical to find an optimal set of markers among those up or down regulated genes that can be clinically utilized to build assays for the diagnosis or to follow progression of specific cancer types. In this paper, we employ a mixed integer programming based classification algorithm named hyper-box enclosure method (HBE) for the classification of some cancer types with a minimal set of predictor genes. This optimization based method which is a user friendly and efficient classifier may allow the clinicians to diagnose and follow progression of certain cancer types. Methodology/Principal Findings: We apply HBE algorithm to some well known data sets such as leukemia, prostate cancer, diffuse large B-cell lymphoma (DLBCL), small round blue cell tumors (SRBCT) to find some predictor genes that can be utilized for diagnosis and prognosis in a robust manner with a high accuracy. Our approach does not require any modification or parameter optimization for each data set. Additionally, information gain attribute evaluator, relief attribute evaluator and correlation-based feature selection methods are employed for the gene selection. The results are compared with those from other studies and biological roles of selected genes in corresponding cancer type are described. Conclusions/Significance: The performance of our algorithm overall was better than the other algorithms reported in the literature and classifiers found in WEKA data-mining package. Since it does not require a parameter optimization and it performs consistently very high prediction rate on different type of data sets, HBE method is an effective and consistent tool for cancer type prediction with a small number of gene markers.
引用
收藏
页数:10
相关论文
共 68 条
  • [1] Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
    Alizadeh, AA
    Eisen, MB
    Davis, RE
    Ma, C
    Lossos, IS
    Rosenwald, A
    Boldrick, JG
    Sabet, H
    Tran, T
    Yu, X
    Powell, JI
    Yang, LM
    Marti, GE
    Moore, T
    Hudson, J
    Lu, LS
    Lewis, DB
    Tibshirani, R
    Sherlock, G
    Chan, WC
    Greiner, TC
    Weisenburger, DD
    Armitage, JO
    Warnke, R
    Levy, R
    Wilson, W
    Grever, MR
    Byrd, JC
    Botstein, D
    Brown, PO
    Staudt, LM
    [J]. NATURE, 2000, 403 (6769) : 503 - 511
  • [2] [Anonymous], 2009, WEKA DATA MINING SOF
  • [3] Optimization models for cancer classification: extracting gene interaction information from microarray expression data
    Antonov, AV
    Tetko, IV
    Mader, MT
    Budczies, J
    Mewes, HW
    [J]. BIOINFORMATICS, 2004, 20 (05) : 644 - U145
  • [4] Classification of drug molecules considering their IC50 values using mixed-integer linear programming based hyper-boxes method
    Armutlu, Pelin
    Ozdemir, Muhittin E.
    Uney-Yuksektepe, Fadime
    Kavakli, I. Halil
    Turkay, Metin
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [5] Tissue classification with gene expression profiles
    Ben-Dor, A
    Bruhn, L
    Friedman, N
    Nachman, I
    Schummer, M
    Yakhini, Z
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) : 559 - 583
  • [6] Pattern identification and classification in gene expression data using an autoassociative neural network model
    Bicciato, S
    Pandin, M
    Didonè, G
    Di Bello, C
    [J]. BIOTECHNOLOGY AND BIOENGINEERING, 2003, 81 (05) : 594 - 606
  • [7] Identification of a novel molecular partner of the E2A gene in childhood leukemia
    Brambillasca, F
    Mosna, C
    Colombo, M
    Rivolta, A
    Caslini, C
    Minuzzo, M
    Giudici, G
    Mizzi, L
    Biondi, A
    Privitera, E
    [J]. LEUKEMIA, 1999, 13 (03) : 369 - 375
  • [8] Prognostic value of myeloperoxidase in patients with chest pain
    Brennan, M
    Penn, MS
    Van Lente, F
    Nambi, V
    Shishehbor, MH
    Aviles, RJ
    Goormastic, M
    Pepoy, ML
    McErlean, ES
    Topol, EJ
    Nissen, SE
    Hazen, SL
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2003, 349 (17) : 1595 - 1604
  • [9] A new regularized least squares support vector regression for gene selection
    Chen, Pei-Chun
    Huang, Su-Yun
    Chen, Wei J.
    Hsiao, Chuhsing K.
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [10] Biomarker discovery in microarray gene expression data with Gaussian processes
    Chu, W
    Ghahramani, Z
    Falciani, F
    Wild, DL
    [J]. BIOINFORMATICS, 2005, 21 (16) : 3385 - 3393