Multilocus association testing with penalized regression

被引:9
作者
Basu, Saonli [1 ]
Pan, Wei [1 ]
Shen, Xiaotong [2 ]
Oetting, William S. [3 ]
机构
[1] Univ Minnesota, Sch Publ Hlth, Div Biostat, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
[3] Univ Minnesota, Inst Human Genet, Dept Expt & Clin Pharmacol, Minneapolis, MN 55455 USA
关键词
Lasso; logistic kernel machine regression; logistic regression; random-effects model; score test; sum of squared score (SSU) test; GENOME-WIDE; LINKAGE DISEQUILIBRIUM; CANDIDATE GENE; MULTIPLE SNPS; P-VALUES; SELECTION; POLYMORPHISMS; SIMILARITY; PREDICTION; INFERENCE;
D O I
10.1002/gepi.20625
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
In multilocus association analysis, since some markers may not be associated with a trait, it seems attractive to use penalized regression with the capability of automatic variable selection. On the other hand, in spite of a rapidly growing body of literature on penalized regression, most focus on variable selection and outcome prediction, for which penalized methods are generally more effective than their nonpenalized counterparts. However, for statistical inference, i.e. hypothesis testing and interval estimation, it is less clear how penalized methods would perform, or even how to best apply them, largely due to lack of studies on this topic. In our motivating data for a cohort of kidney transplant recipients, it is of primary interest to assess whether a group of genetic variants are associated with a binary clinical outcome, acute rejection at 6 months. In this article, we study some technical issues and alternative implementations of hypothesis testing in Lasso penalized logistic regression, and compare their performance with each other and with several existing global tests, some of which are specifically designed as variance component tests for high-dimensional data. The most interesting, and perhaps surprising, conclusion of this study is that, for low to moderately high-dimensional data, statistical tests based on Lasso penalized regression are not necessarily more powerful than some existing global tests. In addition, in penalized regression, rather than building a test based on a single selected best model, combining multiple tests, each of which is built on a candidate model, might be more promising. Genet. Epidemiol. 2011.(C) 2011 Wiley Periodicals, Inc. 35:755-765, 2011
引用
收藏
页码:755 / 765
页数:11
相关论文
共 54 条
[1]  
Akaike H., 1973, 2 INT S INFORM THEOR, P267
[2]  
[Anonymous], 1932, STAT METHODS RES WOR
[3]   SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression [J].
Ayers, Kristin L. ;
Cordell, Heather J. .
GENETIC EPIDEMIOLOGY, 2010, 34 (08) :879-891
[4]   APPROXIMATE INFERENCE IN GENERALIZED LINEAR MIXED MODELS [J].
BRESLOW, NE ;
CLAYTON, DG .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (421) :9-25
[5]   Analysis of multiple SNPs in a candidate gene or region [J].
Chapman, Juliet ;
Whittaker, John .
GENETIC EPIDEMIOLOGY, 2008, 32 (06) :560-566
[6]   Insights into Colon Cancer Etiology via a Regularized Approach to Gene Set Analysis of GWAS Data [J].
Chen, Lin S. ;
Hutter, Carolyn M. ;
Potter, John D. ;
Liu, Yan ;
Prentice, Ross L. ;
Peters, Ulrike ;
Hsu, Li .
AMERICAN JOURNAL OF HUMAN GENETICS, 2010, 86 (06) :860-871
[7]   A TWO-SAMPLE TEST FOR HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO GENE-SET TESTING [J].
Chen, Song Xi ;
Qin, Ying-Li .
ANNALS OF STATISTICS, 2010, 38 (02) :808-835
[8]   Use of unphased multilocus genotype data in indirect association studies [J].
Clayton, D ;
Chapman, J ;
Cooper, J .
GENETIC EPIDEMIOLOGY, 2004, 27 (04) :415-428
[9]   So many correlated tests, so little time!: Rapid adjustment of P values for multiple correlated tests [J].
Conneely, Karen N. ;
Boehnke, Michael .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (06) :1158-1168
[10]  
Croiseau Pascal, 2009, BMC Proc, V3 Suppl 7, pS61