Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study

被引:0
作者
Morgan Thomas
Robert T. Smith
Noel M. O’Boyle
Chris de Graaf
Andreas Bender
机构
[1] University of Cambridge,Centre for Molecular Informatics, Department of Chemistry
[2] Computational Chemistry,undefined
[3] Sosei Heptares,undefined
来源
Journal of Cheminformatics | / 13卷
关键词
Artificial Intelligence; AI; Structure-based drug design; SBDD; Ligand-based drug design; LBDD; Deep learning; Generative models; Recurrent neural network; Molecular docking; Reinforcement learning; De novo design; Quantitative structure–activity relationship; QSAR;
D O I
暂无
中图分类号
学科分类号
摘要
Deep generative models have shown the ability to devise both valid and novel chemistry, which could significantly accelerate the identification of bioactive compounds. Many current models, however, use molecular descriptors or ligand-based predictive methods to guide molecule generation towards a desirable property space. This restricts their application to relatively data-rich targets, neglecting those where little data is available to sufficiently train a predictor. Moreover, ligand-based approaches often bias molecule generation towards previously established chemical space, thereby limiting their ability to identify truly novel chemotypes. In this work, we assess the ability of using molecular docking via Glide—a structure-based approach—as a scoring function to guide the deep generative model REINVENT and compare model performance and behaviour to a ligand-based scoring function. Additionally, we modify the previously published MOSES benchmarking dataset to remove any induced bias towards non-protonatable groups. We also propose a new metric to measure dataset diversity, which is less confounded by the distribution of heavy atom count than the commonly used internal diversity metric. With respect to the main findings, we found that when optimizing the docking score against DRD2, the model improves predicted ligand affinity beyond that of known DRD2 active molecules. In addition, generated molecules occupy complementary chemical and physicochemical space compared to the ligand-based approach, and novel physicochemical space compared to known DRD2 active molecules. Furthermore, the structure-based approach learns to generate molecules that satisfy crucial residue interactions, which is information only available when taking protein structure into account. Overall, this work demonstrates the advantage of using molecular docking to guide de novo molecule generation over ligand-based predictors with respect to predicted affinity, novelty, and the ability to identify key interactions between ligand and protein target. Practically, this approach has applications in early hit generation campaigns to enrich a virtual library towards a particular target, and also in novelty-focused projects, where de novo molecule generation either has no prior ligand knowledge available or should not be biased by it.
引用
收藏
相关论文
共 321 条
[1]  
Chen H(2019)Has drug design augmented by artificial intelligence become a reality? Trends Pharmacol Sci 40 806-809
[2]  
Engkvist O(2019)Deep learning enables rapid identification of potent DDR1 kinase inhibitors Nat Biotechnol 37 1038-1040
[3]  
Zhavoronkov A(2020)A deep learning approach to antibiotic discovery Cell 180 688-702
[4]  
Ivanenkov YA(2020)Molecular sets (MOSES): A benchmarking platform for molecular generation models Front Pharmacol 11 1931-849
[5]  
Aliper A(2019)Deep learning for molecular design—a review of the state of the art Mol Syst Des Eng 4 828-131
[6]  
Veselov MS(2018)Generating focused molecule libraries for drug discovery with recurrent neural networks ACS Cent Sci 4 120-36
[7]  
Aladinskiy VA(2017)Molecular de-novo design through deep reinforcement learning J Cheminform 9 48-276
[8]  
Aladinskaya AV(2018)Deep reinforcement learning for de novo drug design Sci Adv. 4 eaap7885-10
[9]  
Stokes JM(1988)SMILES, a chemical language and information system: 1: introduction to methodology and encoding rules J Chem Inf Comput Sci 28 31-3176
[10]  
Yang K(2018)Automatic chemical design using a data-driven continuous representation of molecules ACS Cent Sci 4 268-32994