Regularized logistic regression without a penalty term: An application to cancer classification with microarray data

被引:45
作者
Bielza, Concha [1 ]
Robles, Victor [2 ]
Larranaga, Pedro [1 ]
机构
[1] Tech Univ Madrid, Dept Artificial Intelligence, Madrid, Spain
[2] Tech Univ Madrid, Dept Comp Architecture & Technol, Madrid, Spain
基金
美国国家卫生研究院;
关键词
Logistic regression; Regularization; Estimation of distribution algorithms; Cancer classification; Microarray data; PARTIAL LEAST-SQUARES; GENE SELECTION; DISEASE CLASSIFICATION; TUMOR CLASSIFICATION; REDUCTION; ALGORITHM; CLASSIFIERS; PREDICTION; WRAPPERS; LASSO;
D O I
10.1016/j.eswa.2010.09.140
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Regularized logistic regression is a useful classification method for problems with few samples and a huge number of variables. This regression needs to determine the regularization term, which amounts to searching for the optimal penalty parameter and the norm of the regression coefficient vector. This paper presents a new regularized logistic regression method based on the evolution of the regression coefficients using estimation of distribution algorithms. The main novelty is that it avoids the determination of the regularization term. The chosen simulation method of new coefficients at each step of the evolutionary process guarantees their shrinkage as an intrinsic regularization. Experimental results comparing the behavior of the proposed method with Lasso and ridge logistic regression in three cancer classification problems with microarray data are shown. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:5110 / 5118
页数:9
相关论文
共 69 条
  • [41] Wrappers for feature subset selection
    Kohavi, R
    John, GH
    [J]. ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) : 273 - 324
  • [42] Larranaga P., 2001, Estimation of Distribution Algorithms: ANew Tool for Evolutionary Computation
  • [43] The use of receiver operating characteristic curves in biomedical informatics
    Lasko, TA
    Bhagwat, JG
    Zou, KH
    Ohno-Machado, L
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2005, 38 (05) : 404 - 415
  • [44] LECESSIE S, 1992, APPL STAT-J ROY ST C, V41, P191
  • [45] RIDGE ESTIMATION IN LOGISTIC-REGRESSION
    LEE, AH
    SILVAPULLE, MJ
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 1988, 17 (04) : 1231 - 1257
  • [46] An extensive comparison of recent classification tools applied to microarray data
    Lee, JW
    Lee, JB
    Park, M
    Song, SH
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2005, 48 (04) : 869 - 885
  • [47] Lee S.I., 2006, P 21 NATL C ARTIFICI
  • [48] Logistic regression for disease classification using microarray data:: model selection in a large p and small n case
    Liao, J. G.
    Chin, Khew-Voon
    [J]. BIOINFORMATICS, 2007, 23 (15) : 1945 - 1951
  • [49] Liu H, 2008, CH CRC DATA MIN KNOW, P3
  • [50] Sparse logistic regression with Lp penalty for biomarker identification
    Liu, Zhenqiu
    Jiang, Feng
    Tian, Guoliang
    Wang, Suna
    Sato, Fumiaki
    Meltzer, Stephen J.
    Tan, Ming
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2007, 6