Incorporating Predictor Network in Penalized Regression with Application to Microarray Data

被引:86
作者
Pan, Wei [1 ]
Xie, Benhuai [1 ]
Shen, Xiaotong [2 ]
机构
[1] Univ Minnesota, Sch Publ Hlth, Div Biostat, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
关键词
Elastic net; Generalized boosted Lasso; L-1; penalization; Laplacian; Lasso; Microarray gene expression; Penalized likelihood; VARIABLE SELECTION; REGULARIZATION; CLASSIFICATION; EXPRESSION; SHRINKAGE; GENES; MODEL;
D O I
10.1111/j.1541-0420.2009.01296.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We consider penalized linear regression, especially for "large p, small n" problems; for which the relationships among predictors are described a priori by a network. A class of motivating examples includes modeling a phenotype through gene expression profiles while accounting for coordinated functioning of genes in the form of biological pathways or networks. To incorporate the prior knowledge of the similar effect sizes of neighboring predictors in a network, we propose a grouped penalty based on the L-gamma-norm that smoothes the regression coefficients of the predictors over the network. The main feature of the proposed method is its ability to automatically realize grouped variable selection and exploit grouping effects. We also discuss effects of the choices of the gamma and some weights inside the L-gamma-norm. Simulation studies demonstrate the superior finite-sample performance of the proposed method as compared to Lasso, elastic net; and a recently proposed network-based method. The new method performs best in variable selection across all simulation set-ups considered. For illustration; the method is applied to a microarray dataset to predict survival times for some glioblastoma patients using a gene expression dataset and a gene network compiled from some Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.
引用
收藏
页码:474 / 484
页数:11
相关论文
共 24 条
[1]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR [J].
Bondell, Howard D. ;
Reich, Brian J. .
BIOMETRICS, 2008, 64 (01) :115-123
[4]  
Choe G, 2003, CANCER RES, V63, P2742
[5]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[6]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[7]   COSMIC 2005 [J].
Forbes, S ;
Clements, J ;
Dawson, E ;
Bamford, S ;
Webb, T ;
Dogan, A ;
Flanagan, A ;
Teague, J ;
Wooster, R ;
Futreal, PA ;
Stratton, MR .
BRITISH JOURNAL OF CANCER, 2006, 94 (02) :318-322
[8]   Proper multivariate conditional autoregressive models for spatial data analysis [J].
Gelfand, AE ;
Vounatsou, P .
BIOSTATISTICS, 2003, 4 (01) :11-25
[9]   RIDGE REGRESSION - BIASED ESTIMATION FOR NONORTHOGONAL PROBLEMS [J].
HOERL, AE ;
KENNARD, RW .
TECHNOMETRICS, 1970, 12 (01) :55-&
[10]   Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target [J].
Horvath, S. ;
Zhang, B. ;
Carlson, M. ;
Lu, K. V. ;
Zhu, S. ;
Felciano, R. M. ;
Laurance, M. F. ;
Zhao, W. ;
Qi, S. ;
Chen, Z. ;
Lee, Y. ;
Scheck, A. C. ;
Liau, L. M. ;
Wu, H. ;
Geschwind, D. H. ;
Febbo, P. G. ;
Kornblum, H. I. ;
Cloughesy, T. F. ;
Nelson, S. F. ;
Mischel, P. S. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (46) :17402-17407