Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification

被引:122
作者
Algamal, Zakariya Yahya [1 ]
Lee, Muhammad Hisyam [1 ]
机构
[1] Univ Teknol Malaysia, Dept Math Sci, Skudai 81310, Johor, Malaysia
关键词
Adaptive LASSO; Penalized logistic regression; Cancer classification; Gene selection; VARIABLE SELECTION; REGULARIZATION; ALGORITHM; MICROARRAYS; PREDICTION; PENALTY;
D O I
10.1016/j.eswa.2015.08.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An important application of DNA microarray data is cancer classification. Because of the high-dimensionality problem of microarray data, gene selection approaches are often employed to support the expert systems in diagnostic capability of cancer with high classification accuracy. Penalized logistic regression using the least absolute shrinkage and selection operator (LASSO) is one of the key steps in high-dimensional cancer classification, as gene coefficient estimation and gene selection simultaneously. However, the LASSO has been criticized for being biased in gene selection. The adaptive LASSO (APLR) was originally proposed to overcome the selection bias by assigning a consistent weight to each gene. In high-dimensional data, however, the adaptive LASSO faces practical problems in choosing the type of initial weight. In practice, the LASSO estimator itself has been used as an initial weight. However, this may not be preferable because the LASSO is inconsistent in itself. To address this issue, an alternative initial weight in adaptive penalized logistic regression (CBPLR) is proposed. The effectiveness of the CBPLR is examined on three well-known high-dimensional cancer classification datasets using number of selected genes, area under the curve, and misclassification rate. The experimental results reveal that the proposed CBPLR is quite efficient and feasible for cancer classification. Additionally, the proposed weight is compared with APLR and LASSO and exhibits competitive performance in both classification accuracy and gene selection. The proposed CBPLR has significant impact in penalized logistic regression by selecting fewer genes with high area under the curve and low misclassification rate. Thus, the proposed weight could conceivably be used in other research that implements gene selection in the field of high dimensional cancer classification. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:9326 / 9332
页数:7
相关论文
共 46 条
  • [1] Support vector machines combined with feature selection for breast cancer diagnosis
    Akay, Mehmet Fatih
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 3240 - 3247
  • [2] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [3] Microarray gene expression classification with few genes: Criteria to combine attribute selection and classification methods
    Alonso-Gonzalez, Carlos J.
    Isaac Moro-Sancho, Q.
    Simon-Hurtado, Arancha
    Varela-Arrabal, Ricardo
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) : 7270 - 7280
  • [4] Regularized logistic regression without a penalty term: An application to cancer classification with microarray data
    Bielza, Concha
    Robles, Victor
    Larranaga, Pedro
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 5110 - 5118
  • [5] Classification of mislabelled microarrays using robust sparse logistic regression
    Bootkrajang, Jakramate
    Kaban, Ata
    [J]. BIOINFORMATICS, 2013, 29 (07) : 870 - 877
  • [6] Bühlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9
  • [7] Gene selection in cancer classification using sparse logistic regression with Bayesian regularization
    Cawley, Gavin C.
    Talbot, Nicola L. C.
    [J]. BIOINFORMATICS, 2006, 22 (19) : 2348 - 2355
  • [8] An efficient statistical feature selection approach for classification of gene expression data
    Chandra, B.
    Gupta, Manish
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (04) : 529 - 535
  • [9] Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data
    Chen, Kun-Huang
    Wang, Kung-Jeng
    Wang, Kung-Min
    Angelia, Melani-Adrian
    [J]. APPLIED SOFT COMPUTING, 2014, 24 : 773 - 780
  • [10] Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data
    Cui, Yan
    Zheng, Chun-Hou
    Yang, Jian
    Sha, Wen
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2013, 43 (07) : 933 - 941