Ancestral sequence reconstruction for co-evolutionary models

被引:1
作者
Rodriguez-Horta, Edwin [1 ,2 ,3 ]
Lage-Castellanos, Alejandro [2 ,3 ]
Mulet, Roberto [2 ,3 ]
机构
[1] Sorbonne Univ, Lab Biol Computat & Quantitat LCQB, CNRS, Inst Biol Paris Seine, Paris, France
[2] Univ Havana, Phys Fac, Grp Complex Syst & Stat Phys, Havana 10400, Cuba
[3] Univ Havana, Phys Fac, Dept Theoret Phys, Havana 10400, Cuba
来源
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT | 2022年 / 2022卷 / 01期
关键词
co-evolution; statistical inference in biological systems; computational biology; evolutionary processes; BELIEF PROPAGATION; EPISTASIS;
D O I
10.1088/1742-5468/ac3d93
中图分类号
O3 [力学];
学科分类号
08 ; 0801 ;
摘要
The ancestral sequence reconstruction problem is the inference, back in time, of the properties of common sequence ancestors from the measured properties of contemporary populations. Standard algorithms for this problem assume independent (factorized) evolution of the characters of the sequences, which is generally wrong (e.g. proteins and genome sequences). In this work, we have studied this problem for sequences described by global co-evolutionary models, which reproduce the global pattern of cooperative interactions between the elements that compose it. For this, we first modeled the temporal evolution of correlated real valued characters by a multivariate Ornstein-Uhlenbeck process on a finite tree. This represents sequences as Gaussian vectors evolving in a quadratic potential, who describe the selection forces acting on the evolving entities. Under a Bayesian framework, we developed a reconstruction algorithm for these sequences and obtained an analytical expression to quantify the quality of our estimation. We extend this formalism to discrete valued sequences by applying our method to a Potts model. We showed that for both continuous and discrete configurations, there is a wide range of parameters where, to properly reconstruct the ancestral sequences, intra-species correlations must be taken into account. We also demonstrated that, for sequences with discrete elements, our reconstruction algorithm outperforms traditional schemes based on independent site approximations.
引用
收藏
页数:28
相关论文
共 33 条
  • [1] Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners
    Baldassi, Carlo
    Zamparo, Marco
    Feinauer, Christoph
    Procaccini, Andrea
    Zecchina, Riccardo
    Weigt, Martin
    Pagnani, Andrea
    [J]. PLOS ONE, 2014, 9 (03):
  • [2] A phylogenetic comparative method for studying multivariate adaptation
    Bartoszek, Krzysztof
    Pienaar, Jason
    Mostad, Petter
    Andersson, Staffan
    Hansen, Thomas F.
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2012, 314 : 204 - 215
  • [3] Bickson D, 2009, ARXIV08112518
  • [4] Epistasis as the primary factor in molecular evolution
    Breen, Michael S.
    Kemena, Carsten
    Vlasov, Peter K.
    Notredame, Cedric
    Kondrashov, Fyodor A.
    [J]. NATURE, 2012, 490 (7421) : 535 - +
  • [5] Inverse statistical problems: from the inverse Ising problem to data science
    Chau Nguyen, H.
    Zecchina, Riccardo
    Berg, Johannes
    [J]. ADVANCES IN PHYSICS, 2017, 66 (03) : 197 - 261
  • [6] Inverse statistical physics of protein sequences: a key issues review
    Cocco, Simona
    Feinauer, Christoph
    Figliuzzi, Matteo
    Monasson, Remi
    Weigt, Martin
    [J]. REPORTS ON PROGRESS IN PHYSICS, 2018, 81 (03)
  • [7] EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH
    FELSENSTEIN, J
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) : 368 - 376
  • [8] DCA for genome-wide epistasis analysis: the statistical genetics perspective
    Gao, Chen-Yi
    Cecconi, Fabio
    Vulpiani, Angelo
    Zhou, Hai-Jun
    Aurell, Erik
    [J]. PHYSICAL BIOLOGY, 2019, 16 (02)
  • [9] Gardiner CW., 2009, HDB STOCHASTIC METHO
  • [10] Evolutionary biochemistry: revealing the historical and physical causes of protein properties
    Harms, Michael J.
    Thornton, Joseph W.
    [J]. NATURE REVIEWS GENETICS, 2013, 14 (08) : 559 - 571