Deep learning in template-free de novo biosynthetic pathway design of natural products

被引:0
作者
Xie, Xueying [1 ,2 ]
Gui, Lin [3 ]
Qiao, Baixue [1 ,2 ]
Wang, Guohua [3 ]
Huang, Shan [4 ]
Zhao, Yuming [3 ]
Sun, Shanwen [1 ,2 ]
机构
[1] Northeast Forestry Univ, Key Lab Saline Alkali Vegetat Ecol Restorat, Minist Educ, 26 Hexing Rd, Harbin 150001, Peoples R China
[2] Northeast Forestry Univ, Coll Life Sci, 26 Hexing Rd, Harbin 150040, Peoples R China
[3] Northeast Forestry Univ, Coll Comp & Control Engn, 26 Hexing Rd, Harbin 150040, Peoples R China
[4] Harbin Med Univ, Affiliated Hosp 2, Dept Neurol, 246 Xuefu Rd, Harbin 150081, Peoples R China
基金
中国国家自然科学基金;
关键词
deep learning; template-free; de novo biosynthesis; natural products; MTCS; generative models; PREDICTION; ENZYME; LANGUAGE; METABOLITES; EFFICIENT; RESOURCE;
D O I
10.1093/bib/bbae495
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Natural products (NPs) are indispensable in drug development, particularly in combating infections, cancer, and neurodegenerative diseases. However, their limited availability poses significant challenges. Template-free de novo biosynthetic pathway design provides a strategic solution for NP production, with deep learning standing out as a powerful tool in this domain. This review delves into state-of-the-art deep learning algorithms in NP biosynthesis pathway design. It provides an in-depth discussion of databases like Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, and UniProt, which are essential for model training, along with chemical databases such as Reaxys, SciFinder, and PubChem for transfer learning to expand models' understanding of the broader chemical space. It evaluates the potential and challenges of sequence-to-sequence and graph-to-graph translation models for accurate single-step prediction. Additionally, it discusses search algorithms for multistep prediction and deep learning algorithms for predicting enzyme function. The review also highlights the pivotal role of deep learning in improving catalytic efficiency through enzyme engineering, which is essential for enhancing NP production. Moreover, it examines the application of large language models in pathway design, enzyme discovery, and enzyme engineering. Finally, it addresses the challenges and prospects associated with template-free approaches, offering insights into potential advancements in NP biosynthesis pathway design.
引用
收藏
页数:19
相关论文
共 147 条
  • [11] ProteinBERT: a universal deep-learning model of protein sequence and function
    Brandes, Nadav
    Ofer, Dan
    Peleg, Yam
    Rappoport, Nadav
    Linial, Michal
    [J]. BIOINFORMATICS, 2022, 38 (08) : 2102 - 2110
  • [12] From nature to industry: Harnessing enzymes for biocatalysis
    Buller, R.
    Lutz, S.
    Kazlauskas, R. J.
    Snajdrova, R.
    Moore, J. C.
    Bornscheuer, U. T.
    [J]. SCIENCE, 2023, 382 (6673)
  • [13] Burley SK, 2017, METHODS MOL BIOL, V1606, P627, DOI 10.1007/978-1-4939-7000-1_26
  • [14] The MetaCyc database of metabolic pathways and enzymes - a 2019 update
    Caspi, Ron
    Billington, Richard
    Keseler, Ingrid M.
    Kothari, Anamika
    Krummenacker, Markus
    Midford, Peter E.
    Ong, Wai Kit
    Paley, Suzanne
    Subhraveti, Pallavi
    Karp, Peter D.
    [J]. NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) : D445 - D453
  • [15] BRENDA, the ELIXIR core data resource in 2021: new developments and updates
    Chang, Antje
    Jeske, Lisa
    Ulbrich, Sandra
    Hofmann, Julia
    Koblitz, Julia
    Schomburg, Ida
    Neumann-Schaal, Meina
    Jahn, Dieter
    Schomburg, Dietmar
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D498 - D508
  • [16] Chen B., 2019, arXiv, DOI DOI 10.48550/ARXIV.1910.09688
  • [17] Chen Binghong, 2020, INT C MACHINE LEARNI, V119, P1608
  • [18] TUNING THE ACTIVITY OF AN ENZYME FOR UNUSUAL ENVIRONMENTS - SEQUENTIAL RANDOM MUTAGENESIS OF SUBTILISIN-E FOR CATALYSIS IN DIMETHYLFORMAMIDE
    CHEN, KQ
    ARNOLD, FH
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1993, 90 (12) : 5618 - 5622
  • [19] Learning protein fitness landscapes with deep mutational scanning data from multiple sources
    Chen, Lin
    Zhang, Zehong
    Li, Zhenghao
    Li, Rui
    Huo, Ruifeng
    Chen, Lifan
    Wang, Dingyan
    Luo, Xiaomin
    Chen, Kaixian
    Liao, Cangsong
    Zheng, Mingyue
    [J]. CELL SYSTEMS, 2023, 14 (08) : 706 - 721.e5
  • [20] Fast predictions of liquid-phase acid-catalyzed reaction rates using molecular dynamics simulations and convolutional neural networks
    Chew, Alex K.
    Jiang, Shengli
    Zhang, Weiqi
    Zavala, Victor M.
    Van Lehn, Reid C.
    [J]. CHEMICAL SCIENCE, 2020, 11 (46) : 12464 - 12476