Penalized logistic regression for detecting gene interactions

被引:249
作者
Park, Mee Young [1 ]
Hastie, Trevor [2 ,3 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
关键词
discrete factors; gene interactions; high dimensional; logistic regression; L-2-regularization;
D O I
10.1093/biostatistics/kxm010
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We propose using a variant of logistic regression (LR) with L-2-regularization to fit gene-gene and gene environment interaction models. Studies have shown that many common diseases are influenced by interaction of certain genes. LR models with quadratic penalization not only correctly characterizes the influential genes along with their interaction structures but also yields additional benefits in handling high-dimensional, discrete factors with a binary response. We illustrate the advantages of using an L-2-regularization scheme and compare its performance with that of "multifactor dimensionality reduction" and "FlexTree," 2 recent tools for identifying gene-gene interactions. Through simulated and real data sets, we demonstrate that our method outperforms other methods in the identification of the interaction structures as well as prediction accuracy. In addition, we validate the significance of the factors selected through bootstrap analyses.
引用
收藏
页码:30 / 50
页数:21
相关论文
共 18 条
[1]   An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene interactions on risk of myocardial infarction: The importance of model validation [J].
Coffey, CS ;
Hebert, PR ;
Ritchie, MD ;
Krumholz, HM ;
Gaziano, JM ;
Ridker, PM ;
Brown, NJ ;
Vaughan, DE ;
Moore, JH .
BMC BIOINFORMATICS, 2004, 5 (1)
[2]  
Efron B., 1993, INTRO BOOTSTRAP MONO, DOI DOI 10.1201/9780429246593
[3]   MULTIVARIATE ADAPTIVE REGRESSION SPLINES [J].
FRIEDMAN, JH .
ANNALS OF STATISTICS, 1991, 19 (01) :1-67
[4]   FLEXIBLE METHODS FOR ANALYZING SURVIVAL-DATA USING SPLINES, WITH APPLICATIONS TO BREAST-CANCER PROGNOSIS [J].
GRAY, RJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1992, 87 (420) :942-951
[5]   Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions [J].
Hahn, LW ;
Ritchie, MD ;
Moore, JH .
BIOINFORMATICS, 2003, 19 (03) :376-382
[6]  
Hastie T., 1990, Generalized additive model
[7]   RIDGE REGRESSION - BIASED ESTIMATION FOR NONORTHOGONAL PROBLEMS [J].
HOERL, AE ;
KENNARD, RW .
TECHNOMETRICS, 1970, 12 (01) :55-&
[8]   Tree-structured supervised learning and the genetics of hypertension [J].
Huang, J ;
Lin, A ;
Narasimhan, B ;
Quertermous, T ;
Hsiung, CA ;
Ho, LT ;
Grove, JS ;
Olivier, M ;
Ranade, K ;
Risch, NJ ;
Shen, RA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (29) :10529-10534
[9]  
Hung RJ, 2004, CANCER EPIDEM BIOMAR, V13, P1013
[10]  
LECESSIE S, 1992, APPL STAT-J ROY ST C, V41, P191