Inferring demographic parameters in bacterial genomic data using Bayesian and hybrid phylogenetic methods

被引:11
|
作者
Duchene, Sebastian [1 ]
Duchene, David A. [2 ]
Geoghegan, Jemma L. [3 ]
Dyson, Zoe A. [1 ]
Hawkey, Jane [1 ]
Holt, Kathryn E. [1 ]
机构
[1] Univ Melbourne, Mol Sci & Biotechnol Inst Bio21, Dept Biochem & Mol Biol, Parkville, Vic 3020, Australia
[2] Univ Sydney, Sch Life & Environm Sci, Sydney, NSW 2006, Australia
[3] Macquarie Univ, Dept Biol Sci, Sydney, NSW 2109, Australia
来源
BMC EVOLUTIONARY BIOLOGY | 2018年 / 18卷
基金
英国医学研究理事会; 澳大利亚国家健康与医学研究理事会; 英国惠康基金;
关键词
Bayesian phylogenetics; Phylodynamics; Molecular clock; Bacterial evolution; ESTIMATING EVOLUTIONARY RATES; EPIDEMIC SPREAD; TRANSMISSION; PERFORMANCE; INFERENCE; HISTORY; MODELS; HIV;
D O I
10.1186/s12862-018-1210-5
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Recent developments in sequencing technologies make it possible to obtain genome sequences from a large number of isolates in a very short time. Bayesian phylogenetic approaches can take advantage of these data by simultaneously inferring the phylogenetic tree, evolutionary timescale, and demographic parameters (such as population growth rates), while naturally integrating uncertainty in all parameters. Despite their desirable properties, Bayesian approaches can be computationally intensive, hindering their use for outbreak investigations involving genome data for a large numbers of pathogen isolates. An alternative to using full Bayesian inference is to use a hybrid approach, where the phylogenetic tree and evolutionary timescale are estimated first using maximum likelihood. Under this hybrid approach, demographic parameters are inferred from estimated trees instead of the sequence data, using maximum likelihood, Bayesian inference, or approximate Bayesian computation. This can vastly reduce the computational burden, but has the disadvantage of ignoring the uncertainty in the phylogenetic tree and evolutionary timescale. Results: We compared the performance of a fully Bayesian and a hybrid method by analysing six whole-genome SNP data sets from a range of bacteria and simulations. The estimates from the two methods were very similar, suggesting that the hybrid method is a valid alternative for very large datasets. However, we also found that congruence between these methods is contingent on the presence of strong temporal structure in the data (i.e. clocklike behaviour), which is typically verified using a date-randomisation test in a Bayesian framework. To reduce the computational burden of this Bayesian test we implemented a date-randomisation test using a rapid maximum likelihood method, which has similar performance to its Bayesian counterpart. Conclusions: Hybrid approaches can produce reliable inferences of evolutionary timescales and phylodynamic parameters in a fraction of the time required for fully Bayesian analyses. As such, they are a valuable alternative in outbreak studies involving a large number of isolates.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Assessing Bayesian Phylogenetic Information Content of Morphological Data Using Knowledge From Anatomy Ontologies
    Porto, Diego S.
    Dahdul, Wasila M.
    Lapp, Hilmar
    Balhoff, James P.
    Vision, Todd J.
    Mabee, Paula M.
    Uyeda, Josef
    SYSTEMATIC BIOLOGY, 2022, 71 (06) : 1290 - 1306
  • [32] Re-assessing the phylogenetic status and evolutionary relationship of Forest Owlet (Athene blewitti (Hume 1873)) using genomic data
    Vinay, K. L.
    Natesh, Meghana
    Mehta, Prachi
    Jayapal, Rajah
    Mukherjee, Shomita
    Robin, V. V.
    IBIS, 2022, 164 (04) : 1278 - 1284
  • [33] Inferring extrinsic noise from single-cell gene expression data using approximate Bayesian computation
    Lenive, Oleg
    Kirk, Paul D. W.
    Stumpf, Michael P. H.
    BMC SYSTEMS BIOLOGY, 2016, 10
  • [34] Bayesian Methods for Estimating Animal Abundance at Large Spatial Scales Using Data from Multiple Sources
    Dey, Soumen
    Delampady, Mohan
    Parameshwaran, Ravishankar
    Kumar, N. Samba
    Srivathsa, Arjun
    Karanth, K. Ullas
    JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2017, 22 (02) : 111 - 139
  • [35] Groundwater quality assessment using data clustering based on hybrid Bayesian networks
    Aguilera, Pedro A.
    Fernandez, Antonio
    Ropero, Rosa F.
    Molina, Luis
    STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2013, 27 (02) : 435 - 447
  • [36] Bayesian Network Reconstruction Using Systems Genetics Data: Comparison of MCMC Methods
    Tasaki, Shinya
    Ben Sauerwine
    Hoff, Bruce
    Toyoshiba, Hiroyoshi
    Gaiteri, Chris
    Chaibub Neto, Elias
    GENETICS, 2015, 199 (04) : 973 - U128
  • [37] Current Development and Review of Dynamic Bayesian Network-Based Methods for Inferring Gene Regulatory Networks from Gene Expression Data
    Chai, Lian En
    Mohamad, Mohd Saberi
    Deris, Safaai
    Chong, Chuii Khim
    Choon, Yee Wen
    Omatu, Sigeru
    CURRENT BIOINFORMATICS, 2014, 9 (05) : 531 - 539
  • [38] Using multiple sampling strategies to estimate SARS-CoV-2 epidemiological parameters from genomic sequencing data
    Inward, Rhys P. D.
    Parag, Kris, V
    Faria, Nuno R.
    NATURE COMMUNICATIONS, 2022, 13 (01)
  • [39] Modeling Groundwater Quality Parameters Using Hybrid Neuro-Fuzzy Methods
    Kisi, Ozgur
    Azad, Armin
    Kashi, Hamed
    Saeedian, Amir
    Hashemi, Seyed Ali Asghar
    Ghorbani, Salar
    WATER RESOURCES MANAGEMENT, 2019, 33 (02) : 847 - 861
  • [40] Examining Phylogenetic Relationships Among Gibbon Genera Using Whole Genome Sequence Data Using an Approximate Bayesian Computation Approach
    Veeramah, Krishna R.
    Woerner, August E.
    Johnstone, Laurel
    Gut, Ivo
    Gut, Marta
    Marques-Bonet, Tomas
    Carbone, Lucia
    Wall, Jeff D.
    Hammer, Michael F.
    GENETICS, 2015, 200 (01) : 295 - U575