Machine learning-enabled retrobiosynthesis of molecules

被引:53
作者
Yu, Tianhao [1 ,2 ,3 ]
Boob, Aashutosh Girish [1 ,2 ,4 ]
Volk, Michael J. [1 ,2 ,4 ]
Liu, Xuan [1 ,2 ,3 ]
Cui, Haiyang [1 ,2 ,3 ]
Zhao, Huimin [1 ,2 ,3 ,4 ]
机构
[1] Univ Illinois, Dept Chem & Biomol Engn, Urbana, IL 61820 USA
[2] Univ Illinois, Carl R Woese Inst Genom Biol, Urbana, IL 61820 USA
[3] Univ Illinois, NSF Mol Maker Lab Inst, Urbana, IL 61820 USA
[4] Univ Illinois, DOE Ctr Adv Bioenergy & Bioprod Innovat, Urbana, IL 61820 USA
基金
美国国家科学基金会;
关键词
METABOLIC PATHWAYS; PROTEIN; PREDICTION; DESIGN; RETROSYNTHESIS; DATABASE; MODELS; KNOWLEDGEBASE; BIOCATALYSIS; OPTIMIZATION;
D O I
10.1038/s41929-022-00909-w
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Retrobiosynthesis provides an effective and sustainable approach to producing functional molecules. The past few decades have witnessed a rapid expansion of biosynthetic approaches. With the recent advances in data-driven sciences, machine learning (ML) is enriching the retrobiosynthesis design toolbox and being applied to each step of the synthesis design workflow, including retrosynthesis planning, enzyme identification and engineering, and pathway optimization. The ability to learn from existing knowledge, recognize complex patterns and generalize to the unknown has made ML a promising solution to biological problems. In this Review, we summarize the recent progress in the development of ML models for assisting with molecular synthesis. We highlight the key advantages of ML-based biosynthesis design methods and discuss the challenges and outlook for the further development of ML-based approaches.
引用
收藏
页码:137 / 151
页数:15
相关论文
共 134 条
  • [1] The SwissLipids knowledgebase for lipid biology
    Aimo, Lucila
    Liechti, Robin
    Hyka-Nouspikel, Nevila
    Niknejad, Anne
    Gleizes, Anne
    Gotz, Lou
    Kuznetsov, Dmitry
    David, Fabrice P. A.
    van der Goot, F. Gisou
    Riezman, Howard
    Bougueleret, Lydie
    Xenarios, Ioannis
    Bridge, Alan
    [J]. BIOINFORMATICS, 2015, 31 (17) : 2860 - 2866
  • [2] Unified rational protein engineering with sequence-based deep representation learning
    Alley, Ethan C.
    Khimulya, Grigory
    Biswas, Surojit
    AlQuraishi, Mohammed
    Church, George M.
    [J]. NATURE METHODS, 2019, 16 (12) : 1315 - +
  • [3] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [4] Protein sequence design with a learned potential
    Anand, Namrata
    Eguchi, Raphael
    Mathews, Irimpan I.
    Perez, Carla P.
    Derry, Alexander
    Altman, Russ B.
    Huang, Po-Ssu
    [J]. NATURE COMMUNICATIONS, 2022, 13 (01)
  • [5] De novo protein design by deep network hallucination
    Anishchenko, Ivan
    Pellock, Samuel J.
    Chidyausiku, Tamuka M.
    Ramelot, Theresa A.
    Ovchinnikov, Sergey
    Hao, Jingzhou
    Bafna, Khushboo
    Norn, Christoffer
    Kang, Alex
    Bera, Asim K.
    DiMaio, Frank
    Carter, Lauren
    Chow, Cameron M.
    Montelione, Gaetano T.
    Baker, David
    [J]. NATURE, 2021, 600 (7889) : 547 - +
  • [6] The era of big data: Genome-scale modelling meets machine learning
    Antonakoudis, Athanasios
    Barbosa, Rodrigo
    Kotidis, Pavlos
    Kontoravdi, Cleo
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2020, 18 : 3287 - 3300
  • [7] DeepLoc: prediction of protein subcellular localization using deep learning
    Armenteros, Jose Juan Almagro
    Sonderby, Casper Kaae
    Sonderby, Soren Kaae
    Nielsen, Henrik
    Winther, Ole
    [J]. BIOINFORMATICS, 2017, 33 (21) : 3387 - 3395
  • [8] Accurate prediction of protein structures and interactions using a three-track neural network
    Baek, Minkyung
    DiMaio, Frank
    Anishchenko, Ivan
    Dauparas, Justas
    Ovchinnikov, Sergey
    Lee, Gyu Rie
    Wang, Jue
    Cong, Qian
    Kinch, Lisa N.
    Schaeffer, R. Dustin
    Millan, Claudia
    Park, Hahnbeom
    Adams, Carson
    Glassman, Caleb R.
    DeGiovanni, Andy
    Pereira, Jose H.
    Rodrigues, Andria V.
    van Dijk, Alberdina A.
    Ebrecht, Ana C.
    Opperman, Diederik J.
    Sagmeister, Theo
    Buhlheller, Christoph
    Pavkov-Keller, Tea
    Rathinaswamy, Manoj K.
    Dalwadi, Udit
    Yip, Calvin K.
    Burke, John E.
    Garcia, K. Christopher
    Grishin, Nick V.
    Adams, Paul D.
    Read, Randy J.
    Baker, David
    [J]. SCIENCE, 2021, 373 (6557) : 871 - +
  • [9] UniProt: a worldwide hub of protein knowledge
    Bateman, Alex
    Martin, Maria-Jesus
    Orchard, Sandra
    Magrane, Michele
    Alpi, Emanuele
    Bely, Benoit
    Bingley, Mark
    Britto, Ramona
    Bursteinas, Borisas
    Busiello, Gianluca
    Bye-A-Jee, Hema
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Castro, Leyla Garcia
    Garmiri, Penelope
    Georghiou, George
    Gonzales, Daniel
    Gonzales, Leonardo
    Hatton-Ellis, Emma
    Ignatchenko, Alexandr
    Ishtiaq, Rizwan
    Jokinen, Petteri
    Joshi, Vishal
    Jyothi, Dushyanth
    Lopez, Rodrigo
    Luo, Jie
    Lussi, Yvonne
    MacDougall, Alistair
    Madeira, Fabio
    Mahmoudy, Mahdi
    Menchi, Manuela
    Nightingale, Andrew
    Onwubiko, Joseph
    Palka, Barbara
    Pichler, Klemens
    Pundir, Sangya
    Qi, Guoying
    Raj, Shriya
    Renaux, Alexandre
    Lopez, Milagros Rodriguez
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Speretta, Elena
    Turner, Edward
    Tyagi, Nidhi
    Vasudev, Preethi
    Volynkin, Vladimir
    Wardell, Tony
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D506 - D515
  • [10] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242