Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design

被引:6
|
作者
Li, Chengxi [1 ,2 ,3 ]
Zhang, Genwei [1 ]
Mohapatra, Somesh [4 ]
Callahan, Alex J. [1 ]
Loas, Andrei [1 ]
Gomez-Bombarelli, Rafael [4 ]
Pentelute, Bradley L. [1 ,5 ,6 ,7 ]
机构
[1] MIT, Dept Chem, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Zhejiang Univ, Coll Chem & Biol Engn, 866 Yuhangtang Rd, Hangzhou 310030, Zhejiang, Peoples R China
[3] ZJU Hangzhou Global Sci & Technol Innovat Ctr, 733 Jianshe San Rd, Hangzhou 311200, Zhejiang, Peoples R China
[4] MIT, Dept Mat Sci & Engn, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[5] MIT, Koch Inst Integrat Canc Res, 500 Main St, Cambridge, MA 02142 USA
[6] MIT, Ctr Environm Hlth Sci, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[7] Broad Inst MIT & Harvard, 415 Main St, Cambridge, MA 02142 USA
关键词
automated synthesis; drug design; machine learning; peptide nucleic acid; yield prediction; DISCOVERY; PREDICTION; STABILITY;
D O I
10.1002/advs.202201988
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, this work leverages machine learning (ML) algorithms and automated synthesis technology to predict PNA synthesis efficiency and guide rational PNA sequence design. The training data is collected from individual fluorenylmethyloxycarbonyl (Fmoc) deprotection reactions performed on a fully automated PNA synthesizer. The optimized ML model allows for 93% prediction accuracy and 0.97 Pearson's r. The predicted synthesis scores are validated to be correlated with the experimental high-performance liquid chromatography (HPLC) crude purities (correlation coefficient R-2 = 0.95). Furthermore, a general applicability of ML is demonstrated through designing synthetically accessible antisense PNA sequences from 102 315 predicted candidates targeting exon 44 of the human dystrophin gene, SARS-CoV-2, HIV, as well as selected genes associated with cardiovascular diseases, type II diabetes, and various cancers. Collectively, ML provides an accurate prediction of PNA synthesis quality and serves as a useful computational tool for informing PNA sequence design.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] New peptide nucleic acid analogues: Synthesis and application
    Dzimbova, Tatyana
    Pajpanova, Tamara
    JOURNAL OF PEPTIDE SCIENCE, 2008, 14 (08) : 131 - 131
  • [32] Peptide nucleic acid probes for sequence-specific DNA biosensors
    Wang, J
    Palecek, E
    Nielsen, PE
    Rivas, G
    Cai, XH
    Shiraishi, H
    Dontha, N
    Luo, DB
    Farias, PAM
    JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1996, 118 (33) : 7667 - 7670
  • [33] Synthesis of Enantiopure γ-Glutamic Acid Functionalized Peptide Nucleic Acid Monomers
    Huang, Hu
    Joe, Goon Ho
    Choi, Sung Rok
    Kim, Su Nam
    Kim, Yong Tae
    Pak, Chwang Sick
    Hong, Joon Hee
    Lee, Wonjae
    BULLETIN OF THE KOREAN CHEMICAL SOCIETY, 2010, 31 (07): : 2054 - 2056
  • [34] Machine learning for antimicrobial peptide identification and design
    Wan, Fangping
    Wong, Felix
    Collins, James J.
    de la Fuente-nunez, Cesar
    NATURE REVIEWS BIOENGINEERING, 2024, 2 (05): : 392 - 407
  • [35] Design and synthesis of a novel nucleic acid mimic
    Liao, Junzhuo
    Liu, Xiao
    Drueckhammer, Dale
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 248
  • [36] A Peptide & Peptide Nucleic Acid Synthesis Technology for Transporter Molecules and Theranostics - The SPPS
    Pipkorn, Ruediger
    Braun, Klaus
    Wiessler, Manfred
    Waldeck, Waldemar
    Schrenk, Hans-Hermann
    Koch, Mario
    Semmler, Wolfhard
    Komljenovic, Dorde
    INTERNATIONAL JOURNAL OF MEDICAL SCIENCES, 2014, 11 (07): : 697 - 706
  • [37] Sequence-specific nucleic acid damage induced by peptide nucleic acid conjugates that can be enzyme-activated
    Simon, Philippe
    Decout, Jean-Luc
    Fontecave, Marc
    ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2006, 45 (41) : 6859 - 6861
  • [38] Pitfalls and challenges of peptide nucleic acid immobilisation on carbon surfaces for sequence-specific capturing of nucleic acid biomarkers
    Meng, Xiaotong
    Petrou, Loukia
    Kenaan, Ahmad
    Khan, Daanyaal
    O'Hare, Danny
    Ladame, Sylvain
    BIOSENSORS & BIOELECTRONICS, 2024, 264
  • [39] A Peptide Nucleic Acid Embedding a Pseudopeptide Nuclear Localization Sequence in the Backbone Behaves as a Peptide Mimic
    Sforza, Stefano
    Tedeschi, Tullia
    Calabretta, Alessandro
    Corradini, Roberto
    Camerin, Consuelo
    Tonelli, Roberto
    Pession, Andrea
    Marchelli, Rosangela
    EUROPEAN JOURNAL OF ORGANIC CHEMISTRY, 2010, 2010 (13) : 2441 - 2444
  • [40] iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization
    Chen, Zhen
    Zhao, Pei
    Li, Chen
    Li, Fuyi
    Xiang, Dongxu
    Chen, Yong-Zi
    Akutsu, Tatsuya
    Daly, Roger J.
    Webb, Geoffrey, I
    Zhao, Quanzhi
    Kurgan, Lukasz
    Song, Jiangning
    NUCLEIC ACIDS RESEARCH, 2021, 49 (10)