Statistical selection of biological models for genome-wide association analyses

被引:4
作者
Bi, Wenjian [1 ]
Kang, Guolian [1 ]
Pounds, Stanley B. [1 ]
机构
[1] St Jude Childrens Res Hosp, Dept Biostat, 332 N Lauderdale St, Memphis, TN 38105 USA
基金
美国国家卫生研究院;
关键词
Biological models; Genome-wide association study; Multiple adjusted evidence weights; Two-stage discovery validation study; FALSE DISCOVERY RATES; P-VALUES; IDENTIFICATION;
D O I
10.1016/j.ymeth.2018.05.019
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Genome-wide association studies have discovered many biologically important associations of genes with phenotypes. Typically, genome-wide association analyses formally test the association of each genetic feature (SNP, CNV, etc) with the phenotype of interest and summarize the results with multiplicity-adjusted p-values. However, very small p-values only provide evidence against the null hypothesis of no association without indicating which biological model best explains the observed data. Correctly identifying a specific biological model may improve the scientific interpretation and can be used to more effectively select and design a follow-up validation study. Thus, statistical methodology to identify the correct biological model for a particular genotypephenotype association can be very useful to investigators. Here, we propose a general statistical method to summarize how accurately each of five biological models (null, additive, dominant, recessive, co-dominant) represents the data observed for each variant in a GWAS study. We show that the new method stringently controls the false discovery rate and asymptotically selects the correct biological model. Simulations of two-stage discovery-validation studies show that the new method has these properties and that its validation power is similar to or exceeds that of simple methods that use the same statistical model for all SNPs. Example analyses of three data sets also highlight these advantages of the new method. An R package is freely available at www.stjuderesearch.org/site/depts/biostats/maew.
引用
收藏
页码:67 / 75
页数:9
相关论文
共 26 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   Genetic Analysis Workshop 17 mini-exome simulation [J].
Laura Almasy ;
Thomas D Dyer ;
Juan Manuel Peralta ;
Jack W Kent ;
Jac C Charlesworth ;
Joanne E Curran ;
John Blangero .
BMC Proceedings, 5 (Suppl 9)
[3]  
[Anonymous], 2016, AM STAT
[4]   TESTS FOR LINEAR TRENDS IN PROPORTIONS AND FREQUENCIES [J].
ARMITAGE, P .
BIOMETRICS, 1955, 11 (03) :375-386
[5]  
Benjamini Y, 2001, ANN STAT, V29, P1165
[6]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[7]   SOME METHODS FOR STRENGTHENING THE COMMON X2 TESTS [J].
COCHRAN, WG .
BIOMETRICS, 1954, 10 (04) :417-451
[8]   Empirical Bayes methods and false discovery rates for microarrays [J].
Efron, B ;
Tibshirani, R .
GENETIC EPIDEMIOLOGY, 2002, 23 (01) :70-86
[9]  
Efron B., 1978, J AM STAT ASSOC, V96, P1151
[10]   An Open Access Database of Genome-wide Association Results [J].
Johnson, Andrew D. ;
O'Donnell, Christopher J. .
BMC MEDICAL GENETICS, 2009, 10