Application of Bayesian network structure learning to identify causal variant SNPs from resequencing data

被引:7
作者
Christopher E Schlosberg
Tae-Hwi Schwantes-An
Weimin Duan
Nancy L Saccone
机构
[1] Washington University School of Medicine,Division of Biology and Biomedical Sciences
[2] Washington University School of Medicine,Department of Genetics
关键词
Bayesian Network; Hypergeometric Distribution; Exploratory Phase; Logistic Regression Result; Causal SNPs;
D O I
10.1186/1753-6561-5-S9-S109
中图分类号
学科分类号
摘要
Using single-nucleotide polymorphism (SNP) genotypes from the 1000 Genomes Project pilot3 data provided for Genetic Analysis Workshop 17 (GAW17), we applied Bayesian network structure learning (BNSL) to identify potential causal SNPs associated with the Affected phenotype. We focus on the setting in which target genes that harbor causal variants have already been chosen for resequencing; the goal was to detect true causal SNPs from among the measured variants in these genes. Examining all available SNPs in the known causal genes, BNSL produced a Bayesian network from which subsets of SNPs connected to the Affected outcome were identified and measured for statistical significance using the hypergeometric distribution. The exploratory phase of analysis for pooled replicates sometimes identified a set of involved SNPs that contained more true causal SNPs than expected by chance in the Asian population. Analyses of single replicates gave inconsistent results. No nominally significant results were found in analyses of African or European populations. Overall, the method was not able to identify sets of involved SNPs that included a higher proportion of true causal SNPs than expected by chance alone. We conclude that this method, as currently applied, is not effective for identifying causal SNPs that follow the simulation model for the GAW17 data set, which includes many rare causal SNPs.
引用
收藏
相关论文
共 49 条
[1]  
Needham CJ(2007)A primer on learning in Bayesian networks for computational biology PLoS Comput Biol 3 e129-292
[2]  
Bradford JR(2009)A testable prognostic model of nicotine dependence J Neurogenet 23 283-753
[3]  
Bulpitt AJ(2005)Bayesian analysis of signaling networks governing embryonic stem cell fate decisions Bioinformatics 21 741-11
[4]  
Westhead DR(2005)Mining genetic epidemiology data with Bayesian networks application to J Comput Biol 12 1-3278
[5]  
Ramoni RB(2005) gene variation and plasma lipid levels Bioinformatics 21 3273-909
[6]  
Saccone NL(2011)Mining genetic epidemiology data with Bayesian networks I: Bayesian networks and example application (plasma apoE levels) BMC Proc 5 S2-2187
[7]  
Hatsukami DK(2006)Genetic Analysis Workshop 17 mini-exome simulation Nat Genet 38 904-22
[8]  
Bierut LJ(2011)Principal components analysis corrects for stratification in genome-wide association studies BMC Proc 5 S81-575
[9]  
Ramoni MF(2006)A linkage analysis of three quantitative phenotypes in two African samples J Mach Learn Res 7 2149-undefined
[10]  
Woolf PJ(2010)A scoring function for learning Bayesian networks based on mutual information and conditional independence tests J Stat Softw 35 1-undefined