Leveraging language model for advanced multiproperty molecular optimization via prompt engineering

被引:1
作者
Wu, Zhenxing [1 ,2 ]
Zhang, Odin [1 ,2 ]
Wang, Xiaorui [3 ]
Fu, Li [4 ]
Zhao, Huifeng [1 ,2 ]
Wang, Jike [1 ,2 ]
Du, Hongyan [1 ]
Jiang, Dejun [1 ,2 ]
Deng, Yafeng [2 ]
Cao, Dongsheng [4 ]
Hsieh, Chang-Yu [1 ]
Hou, Tingjun [1 ]
机构
[1] Zhejiang Univ, Coll Pharmaceut Sci, Innovat Inst Artificial Intelligence Med, Hangzhou, Peoples R China
[2] CarbonSilicon AI Technol Co Ltd, Hangzhou, Peoples R China
[3] Macau Univ Sci & Technol, Macau Inst Appl Res Med & Hlth, Dr Nehers Biophys Lab Innovat Drug Discovery, State Key Lab Qual Res Chinese Med, Macau, Peoples R China
[4] Cent South Univ, Xiangya Sch Pharmaceut Sci, Changsha, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
PREDICTION; DISCOVERY; EFFICIENT; ACCURATE;
D O I
10.1038/s42256-024-00916-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Optimizing a candidate molecule's physiochemical and functional properties has been a critical task in drug and material design. Although the non-trivial task of balancing multiple (potentially conflicting) optimization objectives is considered ideal for artificial intelligence, several technical challenges such as the scarcity of multiproperty-labelled training data have hindered the development of a satisfactory AI solution for a long time. Prompt-MolOpt is a tool for molecular optimization; it makes use of prompt-based embeddings, as used in large language models, to improve the transformer's ability to optimize molecules for specific property adjustments. Notably, Prompt-MolOpt excels in working with limited multiproperty data (even under the zero-shot setting) by effectively generalizing causal relationships learned from single-property datasets. In comparative evaluations against established models such as JTNN, hierG2G and Modof, Prompt-MolOpt achieves over a 15% relative improvement in multiproperty optimization success rates compared with the leading Modof model. Furthermore, a variant of Prompt-MolOpt, named Prompt-MolOptP, can preserve the pharmacophores or any user-specified fragments under the structural transformation, further broadening its application scope. By constructing tailored optimization datasets, with the protocol introduced in this work, Prompt-MolOpt steers molecular optimization towards domain-relevant chemical spaces, enhancing the quality of the optimized molecules. Real-world tests, such as those involving blood-brain barrier permeability optimization, underscore its practical relevance. Prompt-MolOpt offers a versatile approach for multiproperty and multi-site molecular optimizations, suggesting its potential utility in chemistry research and drug and material discovery. Designing molecules in drug design is challenging as it requires optimizing multiple, potentially competing qualities. Wu and colleagues present a prompt-based molecule optimization method that can be trained from single-property data.
引用
收藏
页码:1359 / 1369
页数:15
相关论文
共 47 条
  • [1] Bickerton GR, 2012, NAT CHEM, V4, P90, DOI [10.1038/NCHEM.1243, 10.1038/nchem.1243]
  • [2] Regression Transformer enables concurrent sequence regression and generation for molecular language modelling
    Born, Jannis
    Manica, Matteo
    [J]. NATURE MACHINE INTELLIGENCE, 2023, 5 (04) : 432 - +
  • [3] A deep generative model for molecule optimization via one fragment modification
    Chen, Ziqi
    Min, Martin Renqiang
    Parthasarathy, Srinivasan
    Ning, Xia
    [J]. NATURE MACHINE INTELLIGENCE, 2021, 3 (12) : 1040 - 1049
  • [4] How well do medicinal chemists learn from experience?
    Cheshire, David R.
    [J]. DRUG DISCOVERY TODAY, 2011, 16 (17-18) : 817 - 821
  • [5] Extracting medicinal chemistry intuition via preference machine learning
    Choung, Oh-Hyeon
    Vianello, Riccardo
    Segler, Marwin
    Stiefl, Nikolaus
    Jimenez-Luna, Jose
    [J]. NATURE COMMUNICATIONS, 2023, 14 (01)
  • [6] ESOL: Estimating aqueous solubility directly from molecular structure
    Delaney, JS
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (03): : 1000 - 1005
  • [7] Matched Molecular Pair Analysis in drug discovery
    Dossetter, Alexander G.
    Griffen, Edward J.
    Leach, Andrew G.
    [J]. DRUG DISCOVERY TODAY, 2013, 18 (15-16) : 724 - 731
  • [8] GPT-3: Its Nature, Scope, Limits, and Consequences
    Floridi, Luciano
    Chiriatti, Massimo
    [J]. MINDS AND MACHINES, 2020, 30 (04) : 681 - 694
  • [9] Computer-aided multi-objective optimization in small molecule discovery
    Fromer, Jenna C.
    Coley, Connor W.
    [J]. PATTERNS, 2023, 4 (02):
  • [10] Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking
    Gentile, Francesco
    Yaacoub, Jean Charle
    Gleave, James
    Fernandez, Michael
    Ton, Anh-Tien
    Ban, Fuqiang
    Stern, Abraham
    Cherkasov, Artem
    [J]. NATURE PROTOCOLS, 2022, 17 (03) : 672 - +