LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning

被引:1
作者
Long, Yueming [1 ]
Mora, Ariane [1 ]
Li, Francesca-Zhoufan [2 ]
Gursoy, Emre [1 ,3 ]
Johnston, Kadina E. [2 ,4 ]
Arnold, Frances H. [1 ,2 ]
机构
[1] CALTECH, Div Chem & Chem Engn, Pasadena, CA 91125 USA
[2] CALTECH, Div Biol & Bioengn, Pasadena, CA 91125 USA
[3] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, Schanzenstr 44, CH-4056 Basel, Switzerland
[4] Merck & Co Inc, Discovery Biol, South San Francisco, CA 94080 USA
来源
ACS SYNTHETIC BIOLOGY | 2024年 / 14卷 / 01期
关键词
Directed Evolution; Protein Engineering; NanoporeSequencing; Sequence-Function Data; Machine Learning; Mutagenesis Libraries;
D O I
10.1021/acssynbio.4c00625
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Sequence-function data provides valuable information about the protein functional landscape but is rarely obtained during directed evolution campaigns. Here, we present Long-read every variant Sequencing (LevSeq), a pipeline that combines a dual barcoding strategy with nanopore sequencing to rapidly generate sequence-function data for entire protein-coding genes. LevSeq integrates into existing protein engineering workflows and comes with open-source software for data analysis and visualization. The pipeline facilitates data-driven protein engineering by consolidating sequence-function data to inform directed evolution and provide the requisite data for machine learning-guided protein engineering (MLPE). LevSeq enables quality control of mutagenesis libraries prior to screening, which reduces time and resource costs. Simulation studies demonstrate LevSeq's ability to accurately detect variants under various experimental conditions. Finally, we show LevSeq's utility in engineering protoglobins for new-to-nature chemistry. Widespread adoption of LevSeq and sharing of the data will enhance our understanding of protein sequence-function landscapes and empower data-driven directed evolution.
引用
收藏
页码:230 / 238
页数:9
相关论文
共 57 条
  • [1] Almhjell PJ., 2024, P NATL ACAD SCI USA, V121, pe2400439121, DOI DOI 10.1073/pnas.2400439121
  • [2] [Anonymous], GLOBAL ENZYMES MARKE
  • [3] Data-Driven Protein Engineering for Improving Catalytic Activity and Selectivity
    Ao, Yu-Fei
    Doerr, Mark
    Menke, Marian J.
    Born, Stefan
    Heuson, Egon
    Bornscheuer, Uwe T.
    [J]. CHEMBIOCHEM, 2024, 25 (03)
  • [4] uPIC-M: Efficient and Scalable Preparation of Clonal Single Mutant Libraries for High-Throughput Protein Biochemistry
    Appel, Mason J.
    Longwell, Scott A.
    Morri, Maurizio
    Neff, Norma
    Herschlag, Daniel
    Fordyce, Polly M.
    [J]. ACS OMEGA, 2021, 6 (45): : 30542 - 30554
  • [5] Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
  • [6] Directed Evolution: Bringing New Chemistry to Life
    Arnold, Frances H.
    [J]. ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2018, 57 (16) : 4143 - 4148
  • [7] Rhea, the reaction knowledgebase in 2022
    Bansal, Parit
    Morgat, Anne
    Axelsen, Kristian B.
    Muthukrishnan, Venkatesh
    Coudert, Elisabeth
    Aimo, Lucila
    Hyka-Nouspikel, Nevila
    Gasteiger, Elisabeth
    Kerhornou, Arnaud
    Neto, Teresa Batista
    Pozzato, Monica
    Blatter, Marie-Claude
    Ignatchenko, Alex
    Redaschi, Nicole
    Bridge, Alan
    [J]. NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) : D693 - D700
  • [8] Mathematical expressions useful in the construction, description and evaluation of protein libraries
    Bosley, AD
    Ostermeier, M
    [J]. BIOMOLECULAR ENGINEERING, 2005, 22 (1-3): : 57 - 61
  • [9] Genotyping-in-Thousands by sequencing (GT-seq): A cost effective SNP genotyping method based on custom amplicon sequencing
    Campbell, Nathan R.
    Harmon, Stephanie A.
    Narum, Shawn R.
    [J]. MOLECULAR ECOLOGY RESOURCES, 2015, 15 (04) : 855 - 867
  • [10] Highly multiplexed, fast and accurate nanopore sequencing for verification of synthetic DNA constructs and sequence libraries
    Currin, Andrew
    Swainston, Neil
    Dunstan, Mark S.
    Jervis, Adrian J.
    Mulherin, Paul
    Robinson, Christopher J.
    Taylor, Sandra
    Carbonell, Pablo
    Hollywood, Katherine A.
    Yan, Cunyu
    Takano, Eriko
    Scrutton, Nigel S.
    Breitling, Rainer
    [J]. SYNTHETIC BIOLOGY, 2019, 4 (01)