Learning genetic epistasis using Bayesian network scoring criteria

被引:67
作者
Jiang, Xia [1 ]
Neapolitan, Richard E. [5 ]
Barmada, M. Michael [4 ]
Visweswaran, Shyam [1 ,2 ,3 ]
机构
[1] Univ Pittsburgh, Dept Biomed Informat, Pittsburgh, PA 15213 USA
[2] Univ Pittsburgh, Intelligent Syst Program, Pittsburgh, PA USA
[3] Univ Pittsburgh, Clin & Translat Sci Inst, Pittsburgh, PA USA
[4] Univ Pittsburgh, Dept Human Genet, Pittsburgh, PA USA
[5] NE Illinois Univ, Dept Comp Sci, Chicago, IL 60625 USA
来源
BMC BIOINFORMATICS | 2011年 / 12卷
关键词
GENOME-WIDE ASSOCIATION; REVEALS; APOE;
D O I
10.1186/1471-2105-12-89
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Gene-gene epistatic interactions likely play an important role in the genetic basis of many common diseases. Recently, machine-learning and data mining methods have been developed for learning epistatic relationships from data. A well-known combinatorial method that has been successfully applied for detecting epistasis is Multifactor Dimensionality Reduction (MDR). Jiang et al. created a combinatorial epistasis learning method called BNMBL to learn Bayesian network (BN) epistatic models. They compared BNMBL to MDR using simulated data sets. Each of these data sets was generated from a model that associates two SNPs with a disease and includes 18 unrelated SNPs. For each data set, BNMBL and MDR were used to score all 2-SNP models, and BNMBL learned significantly more correct models. In real data sets, we ordinarily do not know the number of SNPs that influence phenotype. BNMBL may not perform as well if we also scored models containing more than two SNPs. Furthermore, a number of other BN scoring criteria have been developed. They may detect epistatic interactions even better than BNMBL. Although BNs are a promising tool for learning epistatic relationships from data, we cannot confidently use them in this domain until we determine which scoring criteria work best or even well when we try learning the correct model without knowledge of the number of SNPs in that model. Results: We evaluated the performance of 22 BN scoring criteria using 28,000 simulated data sets and a real Alzheimer's GWAS data set. Our results were surprising in that the Bayesian scoring criterion with large values of a hyperparameter called alpha performed best. This score performed better than other BN scoring criteria and MDR at recall using simulated data sets, at detecting the hardest-to-detect models using simulated data sets, and at substantiating previous results using the real Alzheimer's data set. Conclusions: We conclude that representing epistatic interactions using BN models and scoring them using a BN scoring criterion holds promise for identifying epistatic genetic variants in data. In particular, the Bayesian scoring criterion with large values of a hyperparameter a appears more promising than a number of alternatives.
引用
收藏
页数:12
相关论文
共 57 条
  • [1] [Anonymous], 2007, Bayesian networks and decision graphs, DOI DOI 10.1007/978-0-387-68282-2
  • [2] [Anonymous], 2004, Learning Bayesian Networks
  • [3] ARMES BM, 2000, CANCER, V83, P2335
  • [4] Bateson W., 1909, MENDELS PRINCIPLES H
  • [5] BRINZA D, 2008, BIOINFORMATICS ALGOR, P395
  • [6] The essence of SNPs
    Brookes, AJ
    [J]. GENE, 1999, 234 (02) : 177 - 186
  • [7] CASTILLO E, 2007, EXPERT SYSTEMS PROBA
  • [8] Chickering David Maxwell, 1996, Learning from data: Artificial intelligence and statistics V, P121
  • [9] Multifactor-dimensionality reduction shows a two-locus interaction associated with Type 2 diabetes mellitus
    Cho, YM
    Ritchie, MD
    Moore, JH
    Park, JY
    Lee, KU
    Shin, HD
    Lee, HK
    Park, KS
    [J]. DIABETOLOGIA, 2004, 47 (03) : 549 - 554
  • [10] A high-density whole-genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset Alzheimer's disease
    Coon, Keith D.
    Myers, Amanda J.
    Craig, David W.
    Webster, Jennifer A.
    Pearson, John V.
    Lince, Diane Hu
    Zismann, Victoria L.
    Beach, Thomas G.
    Leung, Doris
    Bryden, Leslie
    Halperin, Rebecca F.
    Marlowe, Lauren
    Kaleem, Mona
    Walker, Douglas G.
    Ravid, Rivka
    Heward, Christopher B.
    Rogers, Joseph
    Papassotiropoulos, Andreas
    Reiman, Eric M.
    Hardy, John
    Stephan, Dietrich A.
    [J]. JOURNAL OF CLINICAL PSYCHIATRY, 2007, 68 (04) : 613 - 618