De novo protein design by deep network hallucination

被引:294
作者
Anishchenko, Ivan [1 ,2 ]
Pellock, Samuel J. [1 ,2 ]
Chidyausiku, Tamuka M. [1 ,2 ]
Ramelot, Theresa A. [3 ,4 ]
Ovchinnikov, Sergey [5 ]
Hao, Jingzhou [3 ,4 ]
Bafna, Khushboo [3 ,4 ]
Norn, Christoffer [1 ,2 ]
Kang, Alex [1 ,2 ]
Bera, Asim K. [1 ,2 ]
DiMaio, Frank [1 ,2 ]
Carter, Lauren [1 ,2 ]
Chow, Cameron M. [1 ,2 ]
Montelione, Gaetano T. [3 ,4 ]
Baker, David [1 ,2 ,6 ]
机构
[1] Univ Washington, Dept Biochem, Seattle, WA 98195 USA
[2] Univ Washington, Inst Prot Design, Seattle, WA 98195 USA
[3] Rensselaer Polytech Inst, Dept Chem & Chem Biol, Troy, NY USA
[4] Rensselaer Polytech Inst, Ctr Biotechnol & Interdisciplinary Sci, Troy, NY USA
[5] Harvard Univ, John Harvard Distinguished Sci Fellowship Program, Cambridge, MA 02138 USA
[6] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
关键词
NMR STRUCTURE; SOFTWARE SUITE; PREDICTION; ALIGNMENT; ALGORITHM; FEATURES; SYSTEM;
D O I
10.1038/s41586-021-04184-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
There has been considerable recent progress in protein structure prediction using deep neural networks to predict inter-residue distances from amino acid sequences(1-3). Here we investigate whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occurring proteins used in training the models. We generate random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting residue-residue distance maps, which, as expected, are quite featureless. We then carry out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (Kullback-Leibler divergence) between the inter-residue distance distributions predicted by the network and background distributions averaged over all proteins. Optimization from different random starting points resulted in novel proteins spanning a wide range of sequences and predicted structures. We obtained synthetic genes encoding 129 of the network-'hallucinated' sequences, and expressed and purified the proteins in Escherichia coli; 27 of the proteins yielded monodisperse species with circular dichroism spectra consistent with the hallucinated structures. We determined the three-dimensional structures of three of the hallucinated proteins, two by X-ray crystallography and one by NMR, and these closely matched the hallucinated models. Thus, deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute alongside traditional physics-based models to the de novo design of proteins with new functions.
引用
收藏
页码:547 / +
页数:19
相关论文
共 60 条
  • [1] PREPARATION OF PROTEIN SAMPLES FOR NMR STRUCTURE, FUNCTION, AND SMALL-MOLECULE SCREENING STUDIES
    Acton, Thomas B.
    Xiao, Rong
    Anderson, Stephen
    Aramini, James
    Buchwald, William A.
    Ciccosanti, Colleen
    Conover, Ken
    Everett, John
    Hamilton, Keith
    Huang, Yuanpeng Janet
    Janjua, Haleema
    Kornhaber, Gregory
    Lau, Jessica
    Lee, Dong Yup
    Liu, Gaohua
    Maglaqui, Melissa
    Ma, Lichung
    Mao, Lei
    Patel, Dayaban
    Rossi, Paolo
    Sahdev, Seema
    Shastry, Ritu
    Swapna, G. V. T.
    Tang, Yeufeng
    Tong, Saichiu
    Wang, Dongyan
    Wang, Huang
    Zhao, Li
    Montelione, Gaetano T.
    [J]. FRAGMENT-BASED DRUG DESIGN: TOOLS, PRACTICAL APPROACHES, AND EXAMPLES, 2011, 493 : 21 - 60
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] Anand N., 2019, ICLR
  • [4] Anand N, 2020, PROTEIN SEQUENCE DES, DOI DOI 10.1101/2020.01.06.895466
  • [5] Waterworks-specific composition of drinking water disinfection by-products
    Andersson, Anna
    Harir, Mourad
    Gonsior, Michael
    Hertkorn, Norbert
    Schmitt-Kopplin, Philippe
    Kylin, Henrik
    Karlsson, Susanne
    Ashiq, Muhammad Jamshaid
    Lavonen, Elin
    Nilsson, Kerstin
    Pettersson, AEmma
    Stavklint, Helena
    Bastviken, David
    [J]. ENVIRONMENTAL SCIENCE-WATER RESEARCH & TECHNOLOGY, 2019, 5 (05) : 861 - 872
  • [6] [Anonymous], 2021, The PyMOL Molecular Graphics System
  • [7] [Anonymous], 2015, Inceptionism: Going deeper into neural networks
  • [8] Accurate prediction of protein structures and interactions using a three-track neural network
    Baek, Minkyung
    DiMaio, Frank
    Anishchenko, Ivan
    Dauparas, Justas
    Ovchinnikov, Sergey
    Lee, Gyu Rie
    Wang, Jue
    Cong, Qian
    Kinch, Lisa N.
    Schaeffer, R. Dustin
    Millan, Claudia
    Park, Hahnbeom
    Adams, Carson
    Glassman, Caleb R.
    DeGiovanni, Andy
    Pereira, Jose H.
    Rodrigues, Andria V.
    van Dijk, Alberdina A.
    Ebrecht, Ana C.
    Opperman, Diederik J.
    Sagmeister, Theo
    Buhlheller, Christoph
    Pavkov-Keller, Tea
    Rathinaswamy, Manoj K.
    Dalwadi, Udit
    Yip, Calvin K.
    Burke, John E.
    Garcia, K. Christopher
    Grishin, Nick V.
    Adams, Paul D.
    Read, Randy J.
    Baker, David
    [J]. SCIENCE, 2021, 373 (6557) : 871 - +
  • [9] Evaluating protein structures determined by structural genomics consortia
    Bhattacharya, Aneerban
    Tejero, Roberto
    Montelione, Gaetano T.
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, 66 (04) : 778 - 795
  • [10] Low-N protein engineering with data-efficient deep learning
    Biswas, Surojit
    Khimulya, Grigory
    Alley, Ethan C.
    Esvelt, Kevin M.
    Church, George M.
    [J]. NATURE METHODS, 2021, 18 (04) : 389 - +