Inference of Gene Flow in the Process of Speciation: An Efficient Maximum-Likelihood Method for the Isolation-with-Initial-Migration Model

被引:18
作者
Costa, Rui J. [1 ]
Wilkinson-Herbots, Hilde [1 ]
机构
[1] UCL, Dept Stat Sci, Gower St, London WC1E 6BT, England
基金
英国工程与自然科学研究理事会;
关键词
speciation; coalescent; maximum-likelihood; gene flow; isolation; PAIRWISE NUCLEOTIDE DIFFERENCES; ANCESTRAL POPULATION; COMPOSITE LIKELIHOOD; COALESCENCE TIME; IM MODEL; DIVERGENCE; NUMBER; PARAMETERS; RECOMBINATION; LOCI;
D O I
10.1534/genetics.116.188060
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The isolation-with-migration (IM) model is commonly used to make inferences about gene flow during speciation, using polymorphism data. However, it has been reported that the parameter estimates obtained by fitting the IM model are very sensitive to the model's assumptions-including the assumption of constant gene flow until the present. This article is concerned with the isolation-with-initial-migration (IIM) model, which drops precisely this assumption. In the IIM model, one ancestral population divides into two descendant subpopulations, between which there is an initial period of gene flow and a subsequent period of isolation. We derive a very fast method of fitting an extended version of the IIM model, which also allows for asymmetric gene flow and unequal population sizes. This is a maximum-likelihood method, applicable to data on the number of segregating sites between pairs of DNA sequences from a large number of independent loci. In addition to obtaining parameter estimates, our method can also be used, by means of likelihood-ratio tests, to distinguish between alternative models representing the following divergence scenarios: (a) divergence with potentially asymmetric gene flow until the present, (b) divergence with potentially asymmetric gene flow until some point in the past and in isolation since then, and (c) divergence in complete isolation. We illustrate the procedure on pairs of Drosophila sequences from similar to 30,000 loci. The computing time needed to fit the most complex version of the model to this data set is only a couple of minutes. The R code to fit the IIM model can be found in the supplementary files of this article.
引用
收藏
页码:1597 / 1618
页数:22
相关论文
共 55 条
  • [1] Efficient computation in the IM model
    Andersen, Lars Norvang
    Mailund, Thomas
    Hobolth, Asger
    [J]. JOURNAL OF MATHEMATICAL BIOLOGY, 2014, 68 (06) : 1423 - 1451
  • [2] [Anonymous], INFERENCE COMPLEX PO
  • [3] [Anonymous], 2001, STAT INFERENCE
  • [4] [Anonymous], 2005, Gene genealogies, variation and evolution
  • [5] [Anonymous], 1999, The genetical theory of natural selection: a complete variorum edition
  • [6] Exploring linkage disequilibrium
    Baird, Stuart J. E.
    [J]. MOLECULAR ECOLOGY RESOURCES, 2015, 15 (05) : 1017 - 1019
  • [7] A new approach to estimate parameters of speciation models with application to apes
    Becquet, Celine
    Przeworski, Molly
    [J]. GENOME RESEARCH, 2007, 17 (10) : 1505 - 1519
  • [8] LEARNING ABOUT MODES OF SPECIATION BY COMPUTATIONAL APPROACHES
    Becquet, Celine
    Przeworski, Molly
    [J]. EVOLUTION, 2009, 63 (10) : 2547 - 2562
  • [9] Inference for clustered data using the independence loglikelihood
    Chandler, Richard E.
    Bate, Steven
    [J]. BIOMETRIKA, 2007, 94 (01) : 167 - 183
  • [10] A note on pseudolikelihood constructed from marginal densities
    Cox, DR
    Reid, N
    [J]. BIOMETRIKA, 2004, 91 (03) : 729 - 737