Generating property-matched decoy molecules using deep learning

被引:55
作者
Imrie, Fergus [1 ]
Bradley, Anthony R. [2 ]
Deane, Charlotte M. [1 ]
机构
[1] Univ Oxford, Dept Stat, Oxford Prot Informat Grp, Oxford OX1 3LB, England
[2] Exscientia Ltd, Schrodinger Bldg,Oxford Sci Pk, Oxford OX4 4GE, England
基金
英国工程与自然科学研究理事会;
关键词
DOCKING; SETS; OPTIMIZATION; DISCOVERY;
D O I
10.1093/bioinformatics/btab080
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: An essential step in the development of virtual screening methods is the use of established sets of actives and decoys for benchmarking and training. However, the decoy molecules in commonly used sets are biased meaning that methods often exploit these biases to separate actives and decoys, and do not necessarily learn to perform molecular recognition. This fundamental issue prevents generalization and hinders virtual screening method development. Results: We have developed a deep learning method (DeepCoy) that generates decoys to a user's preferred specification in order to remove such biases or construct sets with a defined bias. We validated DeepCoy using two established benchmarks, DUD-E and DEKOIS 2.0. For all 102 DUD-E targets and 80 of the 81 DEKOIS 2.0 targets, our generated decoy molecules more closely matched the active molecules' physicochemical properties while introducing no discernible additional risk of false negatives. The DeepCoy decoys improved the Deviation from Optimal Embedding (DOE) score by an average of 81% and 66%, respectively, decreasing from 0.166 to 0.032 for DUD-E and from 0.109 to 0.038 for DEKOIS 2.0. Further, the generated decoys are harder to distinguish than the original decoy molecules via docking with Autodock Vina, with virtual screening performance falling from an AUC ROC of 0.70 to 0.63.
引用
收藏
页码:2134 / 2141
页数:8
相关论文
共 41 条
[1]   Machine learning classification can reduce false positives in structure-based virtual screening [J].
Adeshina, Yusuf O. ;
Deeds, Eric J. ;
Karanicolas, John .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (31) :18477-18488
[2]  
Bauer MR, 2013, J CHEM INF MODEL, V53, P1447, DOI [10.1021/ci400115b, 10.1021/ci400115bl]
[3]   The ChEMBL bioactivity database: an update [J].
Bento, A. Patricia ;
Gaulton, Anna ;
Hersey, Anne ;
Bellis, Louisa J. ;
Chambers, Jon ;
Davies, Mark ;
Krueger, Felix A. ;
Light, Yvonne ;
Mak, Lora ;
McGlinchey, Shaun ;
Nowotka, Michal ;
Papadatos, George ;
Santos, Rita ;
Overington, John P. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D1083-D1090
[4]   Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance [J].
Chaput, Ludovic ;
Martinez-Sanz, Juan ;
Saettel, Nicolas ;
Mouawad, Liliane .
JOURNAL OF CHEMINFORMATICS, 2016, 8 :1-17
[5]   Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening [J].
Chen, Lieyang ;
Cruz, Anthony ;
Ramsey, Steven ;
Dickson, Callum J. ;
Duca, Jose S. ;
Hornak, Viktor ;
Koes, David R. ;
Kurtzman, Tom .
PLOS ONE, 2019, 14 (08)
[6]   Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions [J].
Ertl, Peter ;
Schuffenhauer, Ansgar .
JOURNAL OF CHEMINFORMATICS, 2009, 1
[7]   Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules [J].
Gomez-Bombarelli, Rafael ;
Wei, Jennifer N. ;
Duvenaud, David ;
Hernandez-Lobato, Jose Miguel ;
Sanchez-Lengeling, Benjamin ;
Sheberla, Dennis ;
Aguilera-Iparraguirre, Jorge ;
Hirzel, Timothy D. ;
Adams, Ryan P. ;
Aspuru-Guzik, Alan .
ACS CENTRAL SCIENCE, 2018, 4 (02) :268-276
[8]   Benchmarking sets for molecular docking [J].
Huang, Niu ;
Shoichet, Brian K. ;
Irwin, John J. .
JOURNAL OF MEDICINAL CHEMISTRY, 2006, 49 (23) :6789-6801
[9]   Deep Generative Models for 3D Linker Design [J].
Imrie, Fergus ;
Bradley, Anthony R. ;
van der Schaar, Mihaela ;
Deane, Charlotte M. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2020, 60 (04) :1983-1995
[10]   Protein Family-Specific Models Using Deep Neural Networks and Transfer Learning Improve Virtual Screening and Highlight the Need for More Data [J].
Imrie, Fergus ;
Bradley, Anthony R. ;
van der Schaar, Mihaela ;
Deane, Charlotte M. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2018, 58 (11) :2319-2330