A Bayes factor approach with informative prior for rare genetic variant analysis from next generation sequencing data

被引:2
作者
Xu Jingxiong [1 ,2 ]
Xu Wei [1 ,3 ]
Briollais, Laurent [1 ,2 ]
机构
[1] Univ Toronto, Dalla Lana Sch Publ Hlth, Toronto, ON, Canada
[2] Sinai Hlth Syst, Lunenfeld Tanenbaum Res Inst, Toronto, ON, Canada
[3] Princess Margaret Canc Ctr, Toronto, ON, Canada
基金
加拿大健康研究院; 加拿大自然科学与工程研究理事会;
关键词
Bayes factor; Bayesian FDR; gene-based analysis; rare variant; whole-exome sequencing study; ASSOCIATION ANALYSIS; UNCERTAINTY;
D O I
10.1111/biom.13278
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The discovery of rare genetic variants through next generation sequencing is a very challenging issue in the field of human genetics. We propose a novel region-based statistical approach based on a Bayes Factor (BF) to assess evidence of association between a set of rare variants (RVs) located on the same genomic region and a disease outcome in the context of case-control design. Marginal likelihoods are computed under the null and alternative hypotheses assuming a binomial distribution for the RV count in the region and a beta or mixture of Dirac and beta prior distribution for the probability of RV. We derive the theoretical null distribution of the BF under our prior setting and show that a Bayesian control of the false Discovery Rate can be obtained for genome-wide inference. Informative priors are introduced using prior evidence of association from a Kolmogorov-Smirnov test statistic. We use our simulation program, sim1000G, to generate RV data similar to the 1000 genomes sequencing project. Our simulation studies showed that the new BF statistic outperforms standard methods (SKAT, SKAT-O, Burden test) in case-control studies with moderate sample sizes and is equivalent to them under large sample size scenarios. Our real data application to a lung cancer case-control study found enrichment for RVs in known and novel cancer genes. It also suggests that using the BF with informative prior improves the overall gene discovery compared to the BF with noninformative prior.
引用
收藏
页码:316 / 328
页数:13
相关论文
共 29 条
  • [1] [Anonymous], 2010, I MATH STAT ONOGRAPH
  • [2] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [3] Identification of lung cancer histology-specific variants applying Bayesian framework variant prioritization approaches within the TRICL and ILCCO consortia
    Brenner, Darren R.
    Amos, Christopher I.
    Brhane, Yonathan
    Timofeeva, Maria N.
    Caporaso, Neil
    Wang, Yufei
    Christiani, David C.
    Bickeboeller, Heike
    Yang, Ping
    Albanes, Demetrius
    Stevens, Victoria L.
    Gapstur, Susan
    McKay, James
    Boffetta, Paolo
    Zaridze, David
    Szeszenia-Dabrowska, Neonilia
    Lissowska, Jolanta
    Rudnai, Peter
    Fabianova, Eleonora
    Mates, Dana
    Bencko, Vladimir
    Foretova, Lenka
    Janout, Vladimir
    Krokan, Hans E.
    Skorpen, Frank
    Gabrielsen, Maiken E.
    Vatten, Lars
    Njolstad, Inger
    Chen, Chu
    Goodman, Gary
    Lathrop, Mark
    Vooder, Tonu
    Valk, Kristjan
    Nelis, Mari
    Metspalu, Andres
    Broderick, Peter
    Eisen, Timothy
    Wu, Xifeng
    Zhang, Di
    Chen, Wei
    Spitz, Margaret R.
    Wei, Yongyue
    Su, Li
    Xie, Dong
    She, Jun
    Matsuo, Keitaro
    Matsuda, Fumihiko
    Ito, Hidemi
    Risch, Angela
    Heinrich, Joachim
    [J]. CARCINOGENESIS, 2015, 36 (11) : 1314 - 1326
  • [4] The Increasing Importance of Gene-Based Analyses
    Cirulli, Elizabeth T.
    [J]. PLOS GENETICS, 2016, 12 (04):
  • [6] Genomic control for association studies
    Devlin, B
    Roeder, K
    [J]. BIOMETRICS, 1999, 55 (04) : 997 - 1004
  • [7] sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs
    Dimitromanolakis, Apostolos
    Xu, Jingxiong
    Krol, Agnieszka
    Briollais, Laurent
    [J]. BMC BIOINFORMATICS, 2019, 20 (1)
  • [8] Rare and common variants: twenty arguments
    Gibson, Greg
    [J]. NATURE REVIEWS GENETICS, 2012, 13 (02) : 135 - 145
  • [9] Hierarchical Bayesian Model for Rare Variant Association Analysis Integrating Genotype Uncertainty in Human Sequence Data
    He, Liang
    Pitkaniemi, Janne
    Sarin, Antti-Pekka
    Salomaa, Veikko
    Sillanpaa, Mikko J.
    Ripatti, Samuli
    [J]. GENETIC EPIDEMIOLOGY, 2015, 39 (02) : 89 - 100
  • [10] Optimal tests for rare variant effects in sequencing association studies
    Lee, Seunggeun
    Wu, Michael C.
    Lin, Xihong
    [J]. BIOSTATISTICS, 2012, 13 (04) : 762 - 775