PRESTO: Rapid protein mechanical strength prediction with an end-to-end deep learning model

被引:11
作者
Liu, Frank Y. C. [1 ]
Ni, Bo [1 ]
Buehler, Markus J. [1 ]
机构
[1] MIT, Dept Civil & Environm Engn, Lab Atomist & Mol Mech LAMM, 77 Massachusetts Ave 1-165, Cambridge, MA 02139 USA
关键词
Deep learning; Protein; Protein mechanical strength; Biomaterials; Pulling force; DIVERSITY; MIXTURES; RULE;
D O I
10.1016/j.eml.2022.101803
中图分类号
TH [机械、仪表工业];
学科分类号
0802 ;
摘要
Proteins often form biomaterials with exceptional mechanical properties equal or even superior to synthetic materials. Currently, using experimental atomic force microscopy or computational molecular dynamics to evaluate protein mechanical strength remains costly and time-consuming, limiting large-scale de novo protein investigations. Therefore, there exists a growing demand for fast and accurate prediction of protein mechanical strength. To address this challenge, we propose PRESTO, a rapid end-to-end deep learning (DL) model to predict protein resistance to pulling directly from its sequence. By integrating a natural language processing model with simulation-based protein stretching data, we first demonstrate that PRESTO can accurately predict the maximal pulling force, F-max, for given protein sequences with unprecedented efficiency, bypassing the costly steps of conventional methods. Enabled by this rapid prediction capacity, we further find that PRESTO can successfully identify specific mutation locations that may greatly influence protein strength in a biologically plausible manner, such as at the center of polyalanine regions. Finally, we apply our method to design de novo protein sequences by randomly mixing two known sequences at varying ratios. Interestingly, the model predicts that the strength of these mixed proteins follows up-or down-opening "banana curves ", constructing a protein strength curve that breaks away from the general linear law of mixtures. By discovering key insights and suggesting potential optimal sequences, we demonstrate the versatility of PRESTO primarily as a screening tool in a rapid protein design pipeline. Thereby our model may offer new pathways for protein material research that requires analysis and testing of large-scale novel protein sets, as a discovery tool that can be complemented with other modeling methods, and ultimately, experimental synthesis and testing. (C) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 77 条
  • [1] Molecular dynamics: Survey of methods for simulating the activity of proteins
    Adcock, Stewart A.
    McCammon, J. Andrew
    [J]. CHEMICAL REVIEWS, 2006, 106 (05) : 1589 - 1615
  • [2] Albawi S, 2017, I C ENG TECHNOL
  • [3] [Anonymous], 1997, NEURAL NETWORKS PATT
  • [4] The Nephila clavipes genome highlights the diversity of spider silk genes and their complex expression
    Babb, Paul L.
    Lahens, Nicholas F.
    Correa-Garhwal, Sandra M.
    Nicholson, David N.
    Kim, Eun Ji
    Hogenesch, John B.
    Kuntner, Matjaz
    Higgins, Linden
    Hayashi, Cheryl Y.
    Agnarsson, Ingi
    Voight, Benjamin F.
    [J]. NATURE GENETICS, 2017, 49 (06) : 895 - +
  • [5] Recombinant protein expression in Escherichia coli
    Baneyx, F
    [J]. CURRENT OPINION IN BIOTECHNOLOGY, 1999, 10 (05) : 411 - 421
  • [6] HELIX GEOMETRY IN PROTEINS
    BARLOW, DJ
    THORNTON, JM
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1988, 201 (03) : 601 - 619
  • [7] How the Protein Data Bank changed biology: An introduction to the JBC Reviews thematic series, part 1
    Berman, Helen M.
    Gierasch, Lila M.
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 2021, 296
  • [8] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [9] BHUSHAN B., 2012, ENCY NANOTECHNOLOGY, DOI 10.1007/978-90-481-9751-4
  • [10] Billur E., 2019, HOT STAMPING ULTRA H, DOI [10.1007/978-3-319-98870-2, DOI 10.1007/978-3-319-98870-2]