Deep learning in template-free de novo biosynthetic pathway design of natural products

被引:0
作者
Xie, Xueying [1 ,2 ]
Gui, Lin [3 ]
Qiao, Baixue [1 ,2 ]
Wang, Guohua [3 ]
Huang, Shan [4 ]
Zhao, Yuming [3 ]
Sun, Shanwen [1 ,2 ]
机构
[1] Northeast Forestry Univ, Key Lab Saline Alkali Vegetat Ecol Restorat, Minist Educ, 26 Hexing Rd, Harbin 150001, Peoples R China
[2] Northeast Forestry Univ, Coll Life Sci, 26 Hexing Rd, Harbin 150040, Peoples R China
[3] Northeast Forestry Univ, Coll Comp & Control Engn, 26 Hexing Rd, Harbin 150040, Peoples R China
[4] Harbin Med Univ, Affiliated Hosp 2, Dept Neurol, 246 Xuefu Rd, Harbin 150081, Peoples R China
基金
中国国家自然科学基金;
关键词
deep learning; template-free; de novo biosynthesis; natural products; MTCS; generative models; PREDICTION; ENZYME; LANGUAGE; METABOLITES; EFFICIENT; RESOURCE;
D O I
10.1093/bib/bbae495
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Natural products (NPs) are indispensable in drug development, particularly in combating infections, cancer, and neurodegenerative diseases. However, their limited availability poses significant challenges. Template-free de novo biosynthetic pathway design provides a strategic solution for NP production, with deep learning standing out as a powerful tool in this domain. This review delves into state-of-the-art deep learning algorithms in NP biosynthesis pathway design. It provides an in-depth discussion of databases like Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, and UniProt, which are essential for model training, along with chemical databases such as Reaxys, SciFinder, and PubChem for transfer learning to expand models' understanding of the broader chemical space. It evaluates the potential and challenges of sequence-to-sequence and graph-to-graph translation models for accurate single-step prediction. Additionally, it discusses search algorithms for multistep prediction and deep learning algorithms for predicting enzyme function. The review also highlights the pivotal role of deep learning in improving catalytic efficiency through enzyme engineering, which is essential for enhancing NP production. Moreover, it examines the application of large language models in pathway design, enzyme discovery, and enzyme engineering. Finally, it addresses the challenges and prospects associated with template-free approaches, offering insights into potential advancements in NP biosynthesis pathway design.
引用
收藏
页数:19
相关论文
共 147 条
  • [1] Isoprenoid Pathway Optimization for Taxol Precursor Overproduction in Escherichia coli
    Ajikumar, Parayil Kumaran
    Xiao, Wen-Hai
    Tyo, Keith E. J.
    Wang, Yong
    Simeon, Fritz
    Leonard, Effendi
    Mucha, Oliver
    Phon, Too Heng
    Pfeifer, Blaine
    Stephanopoulos, Gregory
    [J]. SCIENCE, 2010, 330 (6000) : 70 - 74
  • [2] Unified rational protein engineering with sequence-based deep representation learning
    Alley, Ethan C.
    Khimulya, Grigory
    Biswas, Surojit
    AlQuraishi, Mohammed
    Church, George M.
    [J]. NATURE METHODS, 2019, 16 (12) : 1315 - +
  • [3] Anand N., 2019, Fully differentiable full-atom protein backbone
  • [4] Anand N, 2018, ADV NEUR IN, V31
  • [5] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [6] Rhea, the reaction knowledgebase in 2022
    Bansal, Parit
    Morgat, Anne
    Axelsen, Kristian B.
    Muthukrishnan, Venkatesh
    Coudert, Elisabeth
    Aimo, Lucila
    Hyka-Nouspikel, Nevila
    Gasteiger, Elisabeth
    Kerhornou, Arnaud
    Neto, Teresa Batista
    Pozzato, Monica
    Blatter, Marie-Claude
    Ignatchenko, Alex
    Redaschi, Nicole
    Bridge, Alan
    [J]. NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) : D693 - D700
  • [7] A deep learning architecture for metabolic pathway prediction
    Baranwal, Mayank
    Magner, Abram
    Elvati, Paolo
    Saldinger, Jacob
    Violi, Angela
    Hero, Alfred O.
    [J]. BIOINFORMATICS, 2020, 36 (08) : 2547 - 2553
  • [8] UniProt: the Universal Protein Knowledgebase in 2023
    Bateman, Alex
    Martin, Maria-Jesus
    Orchard, Sandra
    Magrane, Michele
    Ahmad, Shadab
    Alpi, Emanuele
    Bowler-Barnett, Emily H.
    Britto, Ramona
    Cukura, Austra
    Denny, Paul
    Dogan, Tunca
    Ebenezer, ThankGod
    Fan, Jun
    Garmiri, Penelope
    Gonzales, Leonardo Jose da Costa
    Hatton-Ellis, Emma
    Hussein, Abdulrahman
    Ignatchenko, Alexandr
    Insana, Giuseppe
    Ishtiaq, Rizwan
    Joshi, Vishal
    Jyothi, Dushyanth
    Kandasaamy, Swaathi
    Lock, Antonia
    Luciani, Aurelien
    Lugaric, Marija
    Luo, Jie
    Lussi, Yvonne
    MacDougall, Alistair
    Madeira, Fabio
    Mahmoudy, Mahdi
    Mishra, Alok
    Moulang, Katie
    Nightingale, Andrew
    Pundir, Sangya
    Qi, Guoying
    Raj, Shriya
    Raposo, Pedro
    Rice, Daniel L.
    Saidi, Rabie
    Santos, Rafael
    Speretta, Elena
    Stephenson, James
    Totoo, Prabhat
    Turner, Edward
    Tyagi, Nidhi
    Vasudev, Preethi
    Warner, Kate
    Watkins, Xavier
    Zellner, Hermann
    [J]. NUCLEIC ACIDS RESEARCH, 2023, 51 (D1) : D523 - D531
  • [9] Deep Learning Concepts and Applications for Synthetic Biology
    Beardall, William A. V.
    Stan, Guy-Bart
    Dunlop, Mary J.
    [J]. GEN BIOTECHNOLOGY, 2022, 1 (04): : 360 - 371
  • [10] Autonomous chemical research with large language models
    Boiko, Daniil A.
    Macknight, Robert
    Kline, Ben
    Gomes, Gabe
    [J]. NATURE, 2023, 624 (7992) : 570 - +