Deep learning in template-free de novo biosynthetic pathway design of natural products

被引:0
作者
Xie, Xueying [1 ,2 ]
Gui, Lin [3 ]
Qiao, Baixue [1 ,2 ]
Wang, Guohua [3 ]
Huang, Shan [4 ]
Zhao, Yuming [3 ]
Sun, Shanwen [1 ,2 ]
机构
[1] Northeast Forestry Univ, Key Lab Saline Alkali Vegetat Ecol Restorat, Minist Educ, 26 Hexing Rd, Harbin 150001, Peoples R China
[2] Northeast Forestry Univ, Coll Life Sci, 26 Hexing Rd, Harbin 150040, Peoples R China
[3] Northeast Forestry Univ, Coll Comp & Control Engn, 26 Hexing Rd, Harbin 150040, Peoples R China
[4] Harbin Med Univ, Affiliated Hosp 2, Dept Neurol, 246 Xuefu Rd, Harbin 150081, Peoples R China
基金
中国国家自然科学基金;
关键词
deep learning; template-free; de novo biosynthesis; natural products; MTCS; generative models; PREDICTION; ENZYME; LANGUAGE; METABOLITES; EFFICIENT; RESOURCE;
D O I
10.1093/bib/bbae495
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Natural products (NPs) are indispensable in drug development, particularly in combating infections, cancer, and neurodegenerative diseases. However, their limited availability poses significant challenges. Template-free de novo biosynthetic pathway design provides a strategic solution for NP production, with deep learning standing out as a powerful tool in this domain. This review delves into state-of-the-art deep learning algorithms in NP biosynthesis pathway design. It provides an in-depth discussion of databases like Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, and UniProt, which are essential for model training, along with chemical databases such as Reaxys, SciFinder, and PubChem for transfer learning to expand models' understanding of the broader chemical space. It evaluates the potential and challenges of sequence-to-sequence and graph-to-graph translation models for accurate single-step prediction. Additionally, it discusses search algorithms for multistep prediction and deep learning algorithms for predicting enzyme function. The review also highlights the pivotal role of deep learning in improving catalytic efficiency through enzyme engineering, which is essential for enhancing NP production. Moreover, it examines the application of large language models in pathway design, enzyme discovery, and enzyme engineering. Finally, it addresses the challenges and prospects associated with template-free approaches, offering insights into potential advancements in NP biosynthesis pathway design.
引用
收藏
页数:19
相关论文
共 147 条
  • [31] Deep learning in retrosynthesis planning: datasets, models and tools
    Dong, Jingxin
    Zhao, Mingyi
    Liu, Yuansheng
    Su, Yansen
    Zeng, Xiangxiang
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [32] RetroRules: a database of reaction rules for engineering biology
    Duigou, Thomas
    du Lac, Melchior
    Carbonell, Pablo
    Faulon, Jean-Loup
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D1229 - D1235
  • [33] Ig-VAE: Generative modeling of protein structure by direct 3D coordinate generation
    Eguchi, Raphael R.
    Choe, Christian A.
    Huang, Po-Ssu
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (06)
  • [34] ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning
    Elnaggar, Ahmed
    Heinzinger, Michael
    Dallago, Christian
    Rehawi, Ghalia
    Wang, Yu
    Jones, Llion
    Gibbs, Tom
    Feher, Tamas
    Angerer, Christoph
    Steinegger, Martin
    Bhowmik, Debsindhu
    Rost, Burkhard
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 7112 - 7127
  • [35] The Reactome Pathway Knowledgebase
    Fabregat, Antonio
    Jupe, Steven
    Matthews, Lisa
    Sidiropoulos, Konstantinos
    Gillespie, Marc
    Garapati, Phani
    Haw, Robin
    Jassal, Bijay
    Korninger, Florian
    May, Bruce
    Milacic, Marija
    Roca, Corina Duenas
    Rothfels, Karen
    Sevilla, Cristoffer
    Shamovsky, Veronica
    Shorser, Solomon
    Varusai, Thawfeek
    Viteri, Guilherme
    Weiser, Joel
    Wu, Guanming
    Stein, Lincoln
    Hermjakob, Henning
    D'Eustachio, Peter
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) : D649 - D655
  • [36] Ferruz N, 2022, bioRxiv, DOI [10.1101/2022.03.09.483666, 10.1101/2022.03.09.483666, DOI 10.1101/2022.03.09.483666]
  • [37] From sequence to function through structure: Deep learning for protein design
    Ferruz, Noelia
    Heinzinger, Michael
    Akdel, Mehmet
    Goncearenco, Alexander
    Naef, Luca
    Dallago, Christian
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2023, 21 : 238 - 250
  • [38] RetroBioCat as a computer-aided synthesis planning tool for biocatalytic reactions and cascades
    Finnigan, William
    Hepworth, Lorna J.
    Flitsch, Sabine L.
    Turner, Nicholas J.
    [J]. NATURE CATALYSIS, 2021, 4 (02) : 98 - 104
  • [39] Fu C., 2024, Learning on Graphs Conference, V29, P21
  • [40] SciFinder
    Gabrielson, Stephen Walter
    [J]. JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION, 2018, 106 (04) : 588 - 590