Machine learning-assisted directed protein evolution with combinatorial libraries

被引:357
作者
Wu, Zachary [1 ]
Kan, S. B. Jennifer [1 ]
Lewis, Russell D. [2 ]
Wittmann, Bruce J. [2 ]
Arnold, Frances H. [1 ,2 ]
机构
[1] CALTECH, Div Chem & Chem Engn, Pasadena, CA 91125 USA
[2] CALTECH, Div Biol & Bioengn, Pasadena, CA 91125 USA
基金
美国国家科学基金会;
关键词
protein engineering; machine learning; directed evolution; enzyme; catalysis; FITNESS LANDSCAPE; OPTIMIZATION; SILICON;
D O I
10.1073/pnas.1901979116
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
To reduce experimental effort associated with directed protein evolution and to explore the sequence space encoded by mutating multiple positions simultaneously, we incorporate machine learning into the directed evolution workflow. Combinatorial sequence space can be quite expensive to sample experimentally, but machine-learning models trained on tested variants provide a fast method for testing sequence space computationally. We validated this approach on a large published empirical fitness landscape for human GB1 binding protein, demonstrating that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. We then provide an example application in evolving an enzyme to produce each of the two possible product enantiomers (i.e., stereodivergence) of a new-to-nature carbene Si-H insertion reaction. The approach predicted libraries enriched in functional enzymes and fixed seven mutations in two rounds of evolution to identify variants for selective catalysis with 93% and 79% ee (enantiomeric excess). By greatly increasing throughput with in silico modeling, machine learning enhances the quality and diversity of sequence solutions for a protein engineering problem.
引用
收藏
页码:8852 / 8858
页数:7
相关论文
共 47 条
  • [11] De novo protein design: Fully automated sequence selection
    Dahiyat, BI
    Mayo, SL
    [J]. SCIENCE, 1997, 278 (5335) : 82 - 87
  • [12] Why high-error-rate random mutagenesis libraries are enriched in functional and improved proteins
    Drummond, DA
    Iverson, BL
    Georgiou, G
    Arnold, FH
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2005, 350 (04) : 806 - 816
  • [13] Enantiomeric Natural Products: Occurrence and Biogenesis
    Finefield, Jennifer M.
    Sherman, David H.
    Kreitman, Martin
    Williams, Robert M.
    [J]. ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2012, 51 (20) : 4802 - 4836
  • [14] Fowler DM, 2014, NAT METHODS, V11, P801, DOI [10.1038/nmeth.3027, 10.1038/NMETH.3027]
  • [15] Optimizing the search algorithm for protein engineering by directed evolution
    Fox, R
    Roy, A
    Govindarajan, S
    Minshull, J
    Gustafsson, C
    Jones, JT
    Emig, R
    [J]. PROTEIN ENGINEERING, 2003, 16 (08): : 589 - 597
  • [16] Improving catalytic function by ProSAR-driven enzyme evolution
    Fox, Richard J.
    Davis, S. Christopher
    Mundorff, Emily C.
    Newman, Lisa M.
    Gavrilovic, Vesna
    Ma, Steven K.
    Chung, Loleta M.
    Ching, Charlene
    Tam, Sarena
    Muley, Sheela
    Grate, John
    Gruber, John
    Whitman, John C.
    Sheldon, Roger A.
    Huisman, Gjalt W.
    [J]. NATURE BIOTECHNOLOGY, 2007, 25 (03) : 338 - 344
  • [17] Organosilicon Molecules with Medicinal Applications
    Franz, Annaliese K.
    Wilson, Sean O.
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 2013, 56 (02) : 388 - 405
  • [18] Garcia-Borras Marc., 2018, Computational Tools for Chemical Biology, P87, DOI [10.1039/9781788010139-00087, DOI 10.1039/9781788010139-00087]
  • [19] Enzyme engineering: reaching the maximal catalytic efficiency peak
    Goldsmith, Moshe
    Tawfik, Dan S.
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 2017, 47 : 140 - 150
  • [20] SwiftLib: rapid degenerate-codon-library optimization through dynamic programming
    Jacobs, Timothy M.
    Yumerefendi, Hayretin
    Kuhlman, Brian
    Leaver-Fay, Andrew
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (05)