Modeling Sequence-Space Exploration and Emergence of Epistatic Signals in Protein Evolution

被引:25
作者
Bisardi, Matteo [1 ,2 ]
Rodriguez-Rivas, Juan [2 ]
Zamponi, Francesco [1 ]
Weigt, Martin [2 ]
机构
[1] Univ Paris, Sorbonne Univ, Univ PSL, Lab Phys,Ecole Normale Super,CNRS, Paris, France
[2] Sorbonne Univ, Inst Biol Paris Seine, CNRS, Biol Computat & Quantitat LCQB, Paris, France
基金
欧盟地平线“2020”;
关键词
protein evolution; fitness landscapes; sequence space; epistasis; data-driven models; RESIDUE COEVOLUTION; LANDSCAPES; INFORMATION; CONTACTS; MUTATION;
D O I
10.1093/molbev/msab321
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
During their evolution, proteins explore sequence space via an interplay between random mutations and phenotypic selection. Here, we build upon recent progress in reconstructing data-driven fitness landscapes for families of homologous proteins, to propose stochastic models of experimental protein evolution. These models predict quantitatively important features of experimentally evolved sequence libraries, like fitness distributions and position-specific mutational spectra. They also allow us to efficiently simulate sequence libraries for a vast array of combinations of experimental parameters like sequence divergence, selection strength, and library size. We showcase the potential of the approach in reanalyzing two recent experiments to determine protein structure from signals of epistasis emerging in experimental sequence libraries. To be detectable, these signals require sufficiently large and sufficiently diverged libraries. Our modeling framework offers a quantitative explanation for different outcomes of recently published experiments. Furthermore, we can forecast the outcome of time- and resource-intensive evolution experiments, opening thereby a way to computationally optimize experimental protocols.
引用
收藏
页数:12
相关论文
共 45 条
  • [1] ACKLEY DH, 1985, COGNITIVE SCI, V9, P147
  • [2] Design by directed evolution
    Arnold, FH
    [J]. ACCOUNTS OF CHEMICAL RESEARCH, 1998, 31 (03) : 125 - 131
  • [3] Directed Evolution: Bringing New Chemistry to Life
    Arnold, Frances H.
    [J]. ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2018, 57 (16) : 4143 - 4148
  • [4] Learning generative models for protein fold families
    Balakrishnan, Sivaraman
    Kamisetty, Hetunandan
    Carbonell, Jaime G.
    Lee, Su-In
    Langmead, Christopher James
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (04) : 1061 - 1078
  • [5] Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners
    Baldassi, Carlo
    Zamparo, Marco
    Feinauer, Christoph
    Procaccini, Andrea
    Zecchina, Riccardo
    Weigt, Martin
    Pagnani, Andrea
    [J]. PLOS ONE, 2014, 9 (03):
  • [6] Sparse generative modeling via parameter reduction of Boltzmann machines: Application to protein-sequence families
    Barrat-Charlaix, Pierre
    Muntoni, Anna Paola
    Shimagaki, Kai
    Weigt, Martin
    Zamponi, Francesco
    [J]. PHYSICAL REVIEW E, 2021, 104 (02)
  • [7] UniProt: the universal protein knowledgebase in 2021
    Bateman, Alex
    Martin, Maria-Jesus
    Orchard, Sandra
    Magrane, Michele
    Agivetova, Rahat
    Ahmad, Shadab
    Alpi, Emanuele
    Bowler-Barnett, Emily H.
    Britto, Ramona
    Bursteinas, Borisas
    Bye-A-Jee, Hema
    Coetzee, Ray
    Cukura, Austra
    Da Silva, Alan
    Denny, Paul
    Dogan, Tunca
    Ebenezer, ThankGod
    Fan, Jun
    Castro, Leyla Garcia
    Garmiri, Penelope
    Georghiou, George
    Gonzales, Leonardo
    Hatton-Ellis, Emma
    Hussein, Abdulrahman
    Ignatchenko, Alexandr
    Insana, Giuseppe
    Ishtiaq, Rizwan
    Jokinen, Petteri
    Joshi, Vishal
    Jyothi, Dushyanth
    Lock, Antonia
    Lopez, Rodrigo
    Luciani, Aurelien
    Luo, Jie
    Lussi, Yvonne
    Mac-Dougall, Alistair
    Madeira, Fabio
    Mahmoudy, Mahdi
    Menchi, Manuela
    Mishra, Alok
    Moulang, Katie
    Nightingale, Andrew
    Oliveira, Carla Susana
    Pundir, Sangya
    Qi, Guoying
    Raj, Shriya
    Rice, Daniel
    Lopez, Milagros Rodriguez
    Saidi, Rabie
    Sampson, Joseph
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D480 - D489
  • [8] RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences
    Burley, Stephen K.
    Bhikadiya, Charmi
    Bi, Chunxiao
    Bittrich, Sebastian
    Chen, Li
    Crichlow, Gregg, V
    Christie, Cole H.
    Dalenberg, Kenneth
    Di Costanzo, Luigi
    Duarte, Jose M.
    Dutta, Shuchismita
    Feng, Zukang
    Ganesan, Sai
    Goodsell, David S.
    Ghosh, Sutapa
    Green, Rachel Kramer
    Guranovic, Vladimir
    Guzenko, Dmytro
    Hudson, Brian P.
    Lawson, Catherine L.
    Liang, Yuhe
    Lowe, Robert
    Namkoong, Harry
    Peisach, Ezra
    Persikova, Irina
    Randle, Chris
    Rose, Alexander
    Rose, Yana
    Sali, Andrej
    Segura, Joan
    Sekharan, Monica
    Shao, Chenghua
    Tao, Yi-Ping
    Voigt, Maria
    Westbrook, John D.
    Young, Jasmine Y.
    Zardecki, Christine
    Zhuravleva, Marina
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D437 - D451
  • [9] Cadwell R C, 1992, PCR Methods Appl, V2, P28, DOI 10.1101/gr.2.1.28
  • [10] Inverse statistical physics of protein sequences: a key issues review
    Cocco, Simona
    Feinauer, Christoph
    Figliuzzi, Matteo
    Monasson, Remi
    Weigt, Martin
    [J]. REPORTS ON PROGRESS IN PHYSICS, 2018, 81 (03)