Bayesian inference of phylogenetic networks from bi-allelic genetic markers

被引:27
作者
Zhu, Jiafan [1 ]
Wen, Dingqiao [1 ]
Yu, Yun [1 ]
Meudt, Heidi M. [2 ]
Nakhleh, Luay [1 ,3 ]
机构
[1] Rice Univ, Comp Sci, Houston, TX 77005 USA
[2] Museum New Zealand Te Papa Tongarewa, Wellington, New Zealand
[3] Rice Univ, BioSci, Houston, TX 77005 USA
基金
美国国家科学基金会;
关键词
SPECIES TREES; MAXIMUM-LIKELIHOOD; COALESCENT MODEL; DNA-SEQUENCES; HYBRIDIZATION; INTROGRESSION; PHYLOGENOMICS; EVOLUTION; GENOME; LIGHT;
D O I
10.1371/journal.pcbi.1005932
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Phylogenetic networks are rooted, directed, acyclic graphs that model reticulate evolutionary histories. Recently, statistical methods were devised for inferring such networks from either gene tree estimates or the sequence alignments of multiple unlinked loci. Bi-allelic markers, most notably single nucleotide polymorphisms (SNPs) and amplified fragment length polymorphisms (AFLPs), provide a powerful source of genome-wide data. In a recent paper, a method called SNAPP was introduced for statistical inference of species trees from unlinked bi-allelic markers. The generative process assumed by the method combined both a model of evolution for the bi-allelic markers, as well as the multispecies coalescent. A novel component of the method was a polynomial-time algorithm for exact computation of the likelihood of a fixed species tree via integration over all possible gene trees for a given marker. Here we report on a method for Bayesian inference of phylogenetic networks from bi-allelic markers. Our method significantly extends the algorithm for exact computation of phylogenetic network likelihood via integration over all possible gene trees. Unlike the case of species trees, the algorithm is no longer polynomial-time on all instances of phylogenetic networks. Furthermore, the method utilizes a reversible-jump MCMC technique to sample the posterior of phylogenetic networks given bi-allelic marker data. Our method has a very good performance in terms of accuracy and robustness as we demonstrate on simulated data, as well as a data set of multiple New Zealand species of the plant genus Ourisia (Plantaginaceae). We implemented the method in the publicly available, open-source PhyloNet software package.
引用
收藏
页数:32
相关论文
共 18 条
  • [1] Inference of species phylogenies from bi-allelic markers using pseudo-likelihood
    Zhu, Jiafan
    Nakhleh, Luay
    BIOINFORMATICS, 2018, 34 (13) : 376 - 385
  • [2] Population genetic analysis of bi-allelic structural variants from low-coverage sequence data with an expectation-maximization algorithm
    José Ignacio Lucas-Lledó
    David Vicente-Salvador
    Cristina Aguado
    Mario Cáceres
    BMC Bioinformatics, 15
  • [3] Population genetic analysis of bi-allelic structural variants from low-coverage sequence data with an expectation-maximization algorithm
    Ignacio Lucas-Lledo, Jose
    Vicente-Salvador, David
    Aguado, Cristina
    Caceres, Mario
    BMC BIOINFORMATICS, 2014, 15
  • [4] Genotype calling in tetraploid species from bi-allelic marker data using mixture models
    Voorrips, Roeland E.
    Gort, Gerrit
    Vosman, Ben
    BMC BIOINFORMATICS, 2011, 12
  • [5] Bayesian Inference of Species Networks from Multilocus Sequence Data
    Zhang, Chi
    Ogilvie, Huw A.
    Drummond, Alexei J.
    Stadler, Tanja
    MOLECULAR BIOLOGY AND EVOLUTION, 2018, 35 (02) : 504 - 517
  • [6] Inference of Phylogenetic Networks From Sequence Data Using Composite Likelihood
    Kong, Sungsik
    Swofford, David L.
    Kubatko, Laura S.
    SYSTEMATIC BIOLOGY, 2024, 74 (01) : 53 - 69
  • [7] Modeling associations between genetic markers using Bayesian networks
    Villanueva, Edwin
    Maciel, Carlos Dias
    BIOINFORMATICS, 2010, 26 (18) : i632 - i637
  • [8] Genetic diversity, population structure and phylogenetic inference among Italian Orchids of the Serapias genus assessed by AFLP molecular markers
    Sardaro, Maria Luisa Savo
    Atallah, Maroun
    Picarella, Maurizio Enea
    Aracri, Benedetto
    Pagnotta, Mario A.
    PLANT SYSTEMATICS AND EVOLUTION, 2012, 298 (09) : 1701 - 1710
  • [9] Bi-allelic genetic variants in the translational GTPases GTPBP1 and GTPBP2 cause a distinct identical neurodevelopmental syndrome
    Salpietro, Vincenzo
    Maroofian, Reza
    Zaki, Maha S.
    Wangen, Jamie
    Ciolfi, Andrea
    Barresi, Sabina
    Efthymiou, Stephanie
    Lamaze, Angelique
    Aughey, Gabriel N.
    Al Mutairi, Fuad
    Rad, Aboulfazl
    Rocca, Clarissa
    Cali, Elisa
    Accogli, Andrea
    Zara, Federico
    Striano, Pasquale
    Mojarrad, Majid
    Tariq, Huma
    Giacopuzzi, Edoardo
    Taylor, Jenny C.
    Oprea, Gabriela
    Skrahina, Volha
    Rehman, Khalil Ur
    Abd Elmaksoud, Marwa
    Bassiony, Mahmoud
    El Said, Huda G.
    Abdel-Hamid, Mohamed S.
    Al Shalan, Maha
    Seo, Gohun
    Kim, Sohyun
    Lee, Hane
    Khang, Rin
    Issa, Mahmoud Y.
    Elbendary, Hasnaa M.
    Rafat, Karima
    Marinakis, Nikolaos M.
    Traeger-Synodinos, Joanne
    Ververi, Athina
    Sourmpi, Mara
    Eslahi, Atieh
    Zand, Farhad Khadivi
    Toosi, Mehran Beiraghi
    Babaei, Meisam
    Jackson, Adam
    Bertoli-Avella, Aida
    Pagnamenta, Alistair T.
    Niceta, Marcello
    Battini, Roberta
    Corsello, Antonio
    Leoni, Chiara
    AMERICAN JOURNAL OF HUMAN GENETICS, 2024, 111 (01) : 200 - 210
  • [10] Genetic diversity, population structure and phylogenetic inference among Italian Orchids of the Serapias genus assessed by AFLP molecular markers
    Maria Luisa Savo Sardaro
    Maroun Atallah
    Maurizio Enea Picarella
    Benedetto Aracri
    Mario A. Pagnotta
    Plant Systematics and Evolution, 2012, 298 : 1701 - 1710