Modeling Sequence-Space Exploration and Emergence of Epistatic Signals in Protein Evolution

被引:28
作者
Bisardi, Matteo [1 ,2 ]
Rodriguez-Rivas, Juan [2 ]
Zamponi, Francesco [1 ]
Weigt, Martin [2 ]
机构
[1] Univ Paris, Sorbonne Univ, Univ PSL, Lab Phys,Ecole Normale Super,CNRS, Paris, France
[2] Sorbonne Univ, Inst Biol Paris Seine, CNRS, Biol Computat & Quantitat LCQB, Paris, France
基金
欧盟地平线“2020”;
关键词
protein evolution; fitness landscapes; sequence space; epistasis; data-driven models; RESIDUE COEVOLUTION; LANDSCAPES; INFORMATION; CONTACTS; MUTATION;
D O I
10.1093/molbev/msab321
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
During their evolution, proteins explore sequence space via an interplay between random mutations and phenotypic selection. Here, we build upon recent progress in reconstructing data-driven fitness landscapes for families of homologous proteins, to propose stochastic models of experimental protein evolution. These models predict quantitatively important features of experimentally evolved sequence libraries, like fitness distributions and position-specific mutational spectra. They also allow us to efficiently simulate sequence libraries for a vast array of combinations of experimental parameters like sequence divergence, selection strength, and library size. We showcase the potential of the approach in reanalyzing two recent experiments to determine protein structure from signals of epistasis emerging in experimental sequence libraries. To be detectable, these signals require sufficiently large and sufficiently diverged libraries. Our modeling framework offers a quantitative explanation for different outcomes of recently published experiments. Furthermore, we can forecast the outcome of time- and resource-intensive evolution experiments, opening thereby a way to computationally optimize experimental protocols.
引用
收藏
页数:12
相关论文
共 45 条
[1]  
ACKLEY DH, 1985, COGNITIVE SCI, V9, P147
[2]  
[Anonymous], 1998, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
[3]   Design by directed evolution [J].
Arnold, FH .
ACCOUNTS OF CHEMICAL RESEARCH, 1998, 31 (03) :125-131
[4]   Directed Evolution: Bringing New Chemistry to Life [J].
Arnold, Frances H. .
ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2018, 57 (16) :4143-4148
[5]   Learning generative models for protein fold families [J].
Balakrishnan, Sivaraman ;
Kamisetty, Hetunandan ;
Carbonell, Jaime G. ;
Lee, Su-In ;
Langmead, Christopher James .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (04) :1061-1078
[6]   Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners [J].
Baldassi, Carlo ;
Zamparo, Marco ;
Feinauer, Christoph ;
Procaccini, Andrea ;
Zecchina, Riccardo ;
Weigt, Martin ;
Pagnani, Andrea .
PLOS ONE, 2014, 9 (03)
[7]   Sparse generative modeling via parameter reduction of Boltzmann machines: Application to protein-sequence families [J].
Barrat-Charlaix, Pierre ;
Muntoni, Anna Paola ;
Shimagaki, Kai ;
Weigt, Martin ;
Zamponi, Francesco .
PHYSICAL REVIEW E, 2021, 104 (02)
[8]   UniProt: the universal protein knowledgebase in 2021 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Agivetova, Rahat ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Coetzee, Ray ;
Cukura, Austra ;
Da Silva, Alan ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lock, Antonia ;
Lopez, Rodrigo ;
Luciani, Aurelien ;
Luo, Jie ;
Lussi, Yvonne ;
Mac-Dougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Oliveira, Carla Susana ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Rice, Daniel ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sampson, Joseph .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D480-D489
[9]   RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences [J].
Burley, Stephen K. ;
Bhikadiya, Charmi ;
Bi, Chunxiao ;
Bittrich, Sebastian ;
Chen, Li ;
Crichlow, Gregg, V ;
Christie, Cole H. ;
Dalenberg, Kenneth ;
Di Costanzo, Luigi ;
Duarte, Jose M. ;
Dutta, Shuchismita ;
Feng, Zukang ;
Ganesan, Sai ;
Goodsell, David S. ;
Ghosh, Sutapa ;
Green, Rachel Kramer ;
Guranovic, Vladimir ;
Guzenko, Dmytro ;
Hudson, Brian P. ;
Lawson, Catherine L. ;
Liang, Yuhe ;
Lowe, Robert ;
Namkoong, Harry ;
Peisach, Ezra ;
Persikova, Irina ;
Randle, Chris ;
Rose, Alexander ;
Rose, Yana ;
Sali, Andrej ;
Segura, Joan ;
Sekharan, Monica ;
Shao, Chenghua ;
Tao, Yi-Ping ;
Voigt, Maria ;
Westbrook, John D. ;
Young, Jasmine Y. ;
Zardecki, Christine ;
Zhuravleva, Marina .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D437-D451
[10]  
Cadwell R C, 1992, PCR Methods Appl, V2, P28, DOI 10.1101/gr.2.1.28