FUBAR: A Fast, Unconstrained Bayesian AppRoximation for Inferring Selection

被引:918
作者
Murrell, Ben [1 ,2 ,3 ]
Moola, Sasha [1 ,3 ]
Mabona, Amandla [1 ,4 ]
Weighill, Thomas [1 ]
Sheward, Daniel [5 ]
Pond, Sergei L. Kosakovsky [6 ]
Scheffler, Konrad [1 ,6 ]
机构
[1] Univ Stellenbosch, Dept Math Sci, ZA-7600 Stellenbosch, South Africa
[2] MRC, Biomed Informat Res Div, eHlth Res & Innovat Platform, Tygerberg, South Africa
[3] Univ Cape Town, Inst Infect Dis & Mol Med, Computat Biol Grp, ZA-7925 Cape Town, South Africa
[4] Univ Cape Town, Dept Math & Appl Math, ZA-7925 Cape Town, South Africa
[5] Univ Cape Town, Inst Infect Dis & Mol Med, Div Med Virol, ZA-7925 Cape Town, South Africa
[6] Univ Calif San Diego, Dept Med, San Diego, CA 92103 USA
基金
美国国家卫生研究院; 新加坡国家研究基金会;
关键词
evolutionary model; coding sequence evolution; approximate Bayesian inference; parallel algorithms; HUMAN INFLUENZA-VIRUS; AMINO-ACID SITES; POSITIVE SELECTION; LIKELIHOOD APPROACH; HEMAGGLUTININ GENE; ADAPTIVE EVOLUTION; DNA-SEQUENCES; H-1; SUBTYPE; MODELS; SUBSTITUTIONS;
D O I
10.1093/molbev/mst030
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Model-based analyses of natural selection often categorize sites into a relatively small number of site classes. Forcing each site to belong to one of these classes places unrealistic constraints on the distribution of selection parameters, which can result in misleading inference due to model misspecification. We present an approximate hierarchical Bayesian method using a Markov chain Monte Carlo (MCMC) routine that ensures robustness against model misspecification by averaging over a large number of predefined site classes. This leaves the distribution of selection parameters essentially unconstrained, and also allows sites experiencing positive and purifying selection to be identified orders of magnitude faster than by existing methods. We demonstrate that popular random effects likelihood methods can produce misleading results when sites assigned to the same site class experience different levels of positive or purifying selection-an unavoidable scenario when using a small number of site classes. Our Fast Unconstrained Bayesian AppRoximation (FUBAR) is unaffected by this problem, while achieving higher power than existing unconstrained (fixed effects likelihood) methods. The speed advantage of FUBAR allows us to analyze larger data sets than other methods: We illustrate this on a large influenza hemagglutinin data set (3,142 sequences). FUBAR is available as a batch file within the latest HyPhy distribution (http://www.hyphy.org), as well as on the Datamonkey web server ( http://www.datamonkey.org/).
引用
收藏
页码:1196 / 1205
页数:10
相关论文
共 45 条
  • [1] Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models
    Anisimova, Maria
    Kosiol, Carolin
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2009, 26 (02) : 255 - 271
  • [2] Positive selection on the H3 hemagglutinin gene of human influenza virus A
    Bush, RM
    Fitch, WM
    Bender, CA
    Cox, NJ
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (11) : 1457 - 1465
  • [3] Capsid protein evolution and comparative phylogeny of novel porcine parvoviruses
    Cadar, Daniel
    Csagola, Attila
    Kiss, Timea
    Tuboly, Tamas
    [J]. MOLECULAR PHYLOGENETICS AND EVOLUTION, 2013, 66 (01) : 243 - 253
  • [4] THE ANTIGENIC STRUCTURE OF THE INFLUENZA-VIRUS A/PR/8/34 HEMAGGLUTININ (H-1 SUBTYPE)
    CATON, AJ
    BROWNLEE, GG
    YEWDELL, JW
    GERHARD, W
    [J]. CELL, 1982, 31 (02) : 417 - 427
  • [5] A Neutralizing Antibody Selected from Plasma Cells That Binds to Group 1 and Group 2 Influenza A Hemagglutinins
    Corti, Davide
    Voss, Jarrod
    Gamblin, Steven J.
    Codoni, Giosiana
    Macagno, Annalisa
    Jarrossay, David
    Vachieri, Sebastien G.
    Pinna, Debora
    Minola, Andrea
    Vanzetta, Fabrizia
    Silacci, Chiara
    Fernandez-Rodriguez, Blanca M.
    Agatic, Gloria
    Bianchi, Siro
    Giacchetto-Sasselli, Isabella
    Calder, Lesley
    Sallusto, Federica
    Collins, Patrick
    Haire, Lesley F.
    Temperton, Nigel
    Langedijk, Johannes P. M.
    Skehel, John J.
    Lanzavecchia, Antonio
    [J]. SCIENCE, 2011, 333 (6044) : 850 - 856
  • [6] Phylogenetics, likelihood, evolution and complexity
    de Koning, A. P. Jason
    Gu, Wanjun
    Castoe, Todd A.
    Pollock, David D.
    [J]. BIOINFORMATICS, 2012, 28 (22) : 2989 - 2990
  • [7] Models of coding sequence evolution
    Delport, Wayne
    Scheffler, Konrad
    Seoighe, Cathal
    [J]. BRIEFINGS IN BIOINFORMATICS, 2009, 10 (01) : 97 - 109
  • [8] Antibody Recognition of a Highly Conserved Influenza Virus Epitope
    Ekiert, Damian C.
    Bhabha, Gira
    Elsliger, Marc-Andre
    Friesen, Robert H. E.
    Jongeneelen, Mandy
    Throsby, Mark
    Goudsmit, Jaap
    Wilson, Ian A.
    [J]. SCIENCE, 2009, 324 (5924) : 246 - 251
  • [9] EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH
    FELSENSTEIN, J
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) : 368 - 376
  • [10] Taking variation of evolutionary rates between sites into account in inferring phylogenies
    Felsenstein, J
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 2001, 53 (4-5) : 447 - 455