How Deep Learning Tools Can Help Protein Engineers Find Good Sequences

被引:7
作者
Osadchy, Margarita [1 ]
Kolodny, Rachel [1 ]
机构
[1] Univ Haifa, Dept Comp Sci, Jacobs Bldg, IL-3498838 Haifa, Israel
关键词
NEURAL-NETWORK; RECOGNITION; PREDICTION; DESIGN; DNA;
D O I
10.1021/acs.jpcb.1c02449
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
The deep learning revolution introduced a new and efficacious way to address computational challenges in a wide range of fields, relying on large data sets and powerful computational resources. In protein engineering, we consider the challenge of computationally predicting properties of a protein and designing sequences with these properties. Indeed, accurate and fast deep network oracles for different properties of proteins have been developed. These learn to predict a property from an amino acid sequence by training on large sets of proteins that have this property. In particular, deep networks can learn from the set of all known protein sequences to identify ones that are protein-like. A fundamental challenge when engineering sequences that are both protein-like and satisfy a desired property is that these are rare instances within the vast space of all possible ones. When searching for these very rare instances, one would like to use good sampling procedures. Sampling approaches that are decoupled from the prediction of the property or in which the predictor uses only post-sampling to identify good instances are less efficient. The alternative is to use sampling methods that are geared to generate sequences satisfying and/or optimizing the predictor's desired properties. Deep learning has a class of architectures, denoted as generative models, which offer the capability of sampling from the learned distribution of a predicted property. Here, we review the use of deep learning tools to find good sequences for protein engineering, including developing oracles/predictors of a property of the proteins and methods that sample from a distribution of protein-like sequences to optimize the desired property.
引用
收藏
页码:6440 / 6450
页数:11
相关论文
共 90 条
  • [1] Adolphs L, 2019, PR MACH LEARN RES, V89, P486
  • [2] Unified rational protein engineering with sequence-based deep representation learning
    Alley, Ethan C.
    Khimulya, Grigory
    Biswas, Surojit
    AlQuraishi, Mohammed
    Church, George M.
    [J]. NATURE METHODS, 2019, 16 (12) : 1315 - +
  • [3] Anishchenko I., 2020, BIORXIV, DOI [10.1101/2020.07.22.211482, DOI 10.1101/2020.07.22.211482]
  • [4] [Anonymous], ICML
  • [5] Arjovsky M, 2017, PR MACH LEARN RES, V70
  • [6] Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics
    Asgari, Ehsaneddin
    Mofrad, Mohammad R. K.
    [J]. PLOS ONE, 2015, 10 (11):
  • [7] Berger B., 2019, ARXIV190208661
  • [8] Bileschi ML, 2019, bioRxiv, DOI [10.1101/626507, 10.1101/626507, DOI 10.1101/626507]
  • [9] Low-N protein engineering with data-efficient deep learning
    Biswas, Surojit
    Khimulya, Grigory
    Alley, Ethan C.
    Esvelt, Kevin M.
    Church, George M.
    [J]. NATURE METHODS, 2021, 18 (04) : 389 - +
  • [10] Brookes D. H., 2018, ARXIV181003714