Modified versions of Bayesian Information Criterion for genome-wide association studies

被引:30
作者
Frommlet, Florian [1 ]
Ruhaltinger, Felix [1 ]
Twarog, Piotr [2 ]
Bogdan, Malgorzata [2 ]
机构
[1] Med Univ Vienna, Dept Med Stat, Vienna, Austria
[2] Wroclaw Univ Technol, Inst Math & Comp Sci, PL-50370 Wroclaw, Poland
关键词
Genome-wide association; Multiple testing; Linear regression; Model selection; mBIC; QUANTITATIVE TRAIT LOCI; MODEL SELECTION; POPULATION; RISK;
D O I
10.1016/j.csda.2011.05.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
For the vast majority of genome-wide association studies (GWAS) statistical analysis was performed by testing markers individually. Elementary statistical considerations clearly show that in the case of complex traits an approach based on multiple regression or generalized linear models is preferable to testing single markers. A model selection approach to GWAS can be based on modifications of the Bayesian Information Criterion (BIC), where some search strategies are necessary to deal with a huge number of potential models. Comprehensive simulations based on real SNP data confirm that model selection has larger power to detect causal SNPs in complex models than single-marker tests. Furthermore, testing single markers leads to substantial problems with proper ranking of causal SNPs and tends to detect a certain number of false positive SNPs, which are not linked to any of the causal mutations. This behavior of single-marker tests is typical in GWAS for complex traits and can be explained by an aggregated influence of many small random sample correlations between genotypes of the SNP under investigation and other causal SNPs. These findings might at least partially explain problems with low power and nonreplicability of results in GWAS. A real data analysis illustrates advantages of model selection in practice, where publicly available gene expression data as traits for individuals from the HapMap project are reanalyzed. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1038 / 1051
页数:14
相关论文
共 41 条
[1]   Adapting to unknown sparsity by controlling the false discovery rate [J].
Abramovich, Felix ;
Benjamini, Yoav ;
Donoho, David L. ;
Johnstone, Iain M. .
ANNALS OF STATISTICS, 2006, 34 (02) :584-653
[2]   Locating multiple interacting quantitative trait loci using robust model selection [J].
Baierl, Andreas ;
Futschik, Andreas ;
Bogdan, Malgorzata ;
Biecek, Przemyslaw .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (12) :6423-6434
[3]   On locating multiple interacting quantitative trait loci in intercross designs [J].
Baierl, Andreas ;
Bogdan, Malgorzata ;
Frommlet, Florian ;
Futschik, Andreas .
GENETICS, 2006, 173 (03) :1693-1703
[4]   A tutorial on statistical methods for population association studies [J].
Balding, David J. .
NATURE REVIEWS GENETICS, 2006, 7 (10) :781-791
[5]   Haploview: analysis and visualization of LD and haplotype maps [J].
Barrett, JC ;
Fry, B ;
Maller, J ;
Daly, MJ .
BIOINFORMATICS, 2005, 21 (02) :263-265
[6]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[7]   Modifying the Schwarz Bayesian information criterion to locate multiple interacting quantitative trait loci [J].
Bogdan, M ;
Ghosh, JK ;
Doerge, RW .
GENETICS, 2004, 167 (02) :989-999
[8]   Selecting Explanatory Variables with the Modified Version of the Bayesian Information Criterion [J].
Bogdan, Malgorzata ;
Ghosh, Jayanta K. ;
Zak-Szatkowska, Malgorzata .
QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2008, 24 (06) :627-641
[9]   Extending the Modified Bayesian Information Criterion (mBIC) to Dense Markers and Multiple Interval Mapping [J].
Bogdan, Malgorzata ;
Frommlet, Florian ;
Biecek, Przemyslaw ;
Cheng, Riyan ;
Ghosh, Jayanta K. ;
Doerge, R. W. .
BIOMETRICS, 2008, 64 (04) :1162-1169
[10]   A model selection approach for the identification of quantitative trait loci in experimental crosses [J].
Broman, KW ;
Speed, TP .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2002, 64 :641-656