Applying Machine Learning to Ultrafast Shape Recognition in Ligand-Based Virtual Screening

被引:17
作者
Bonanno, Etienne [1 ]
Ebejer, Jean-Paul [2 ]
机构
[1] Univ Malta, Dept Artificial Intelligence, Msida, Malta
[2] Univ Malta, Ctr Mol Med & Biobanking, Msida, Malta
关键词
virtual screening; machine learning; ultrafast shape recognition; ligand based virtual screening; ligand similarity; ElectroShape; SPEAKER IDENTIFICATION; GAUSSIAN DESCRIPTION; DISTANCE GEOMETRY; ELECTROSHAPE; SEARCH;
D O I
10.3389/fphar.2019.01675
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
Ultrafast Shape Recognition (USR), along with its derivatives, are Ligand-Based Virtual Screening (LBVS) methods that condense 3-dimensional information about molecular shape, as well as other properties, into a small set of numeric descriptors. These can be used to efficiently compute a measure of similarity between pairs of molecules using a simple inverse Manhattan Distance metric. In this study we explore the use of suitable Machine Learning techniques that can be trained using USR descriptors, so as to improve the similarity detection of potential new leads. We use molecules from the Directory for Useful Decoys-Enhanced to construct machine learning models based on three different algorithms: Gaussian Mixture Models (GMMs), Isolation Forests and Artificial Neural Networks (ANNs). We train models based on full molecule conformer models, as well as the Lowest Energy Conformations (LECs) only. We also investigate the performance of our models when trained on smaller datasets so as to model virtual screening scenarios when only a small number of actives are known a priori. Our results indicate significant performance gains over a state of the art USR-derived method, ElectroShape 5D, with GMMs obtaining a mean performance up to 430% better than that of ElectroShape 5D in terms of Enrichment Factor with a maximum improvement of up to 940%. Additionally, we demonstrate that our models are capable of maintaining their performance, in terms of enrichment factor, within 10% of the mean as the size of the training dataset is successively reduced. Furthermore, we also demonstrate that running times for retrospective screening using the machine learning models we selected are faster than standard USR, on average by a factor of 10, including the time required for training. Our results show that machine learning techniques can significantly improve the virtual screening performance and efficiency of the USR family of methods.
引用
收藏
页数:18
相关论文
共 50 条
[1]   Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening [J].
Ain, Qurrat Ul ;
Aleksandrova, Antoniya ;
Roessler, Florian D. ;
Ballester, Pedro J. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2015, 5 (06) :405-424
[2]  
[Anonymous], 2013, International Journal of Soft Computing and Engineering (IJSCE)
[3]  
[Anonymous], 2003, A Gaussian Mixture Model Spectral Representation for Speech Recognition
[4]  
[Anonymous], 1990, Concepts and applications of molecular similarity
[5]  
[Anonymous], 2007, REPRESENTATION MANIP
[6]   Improving the accuracy of ultrafast ligand-based screening: incorporating lipophilicity into ElectroShape as an extra dimension [J].
Armstrong, M. Stuart ;
Finn, Paul W. ;
Morris, Garrett M. ;
Richards, W. Graham .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2011, 25 (08) :785-790
[7]   ElectroShape: fast molecular similarity calculations incorporating shape, chirality and electrostatics [J].
Armstrong, M. Stuart ;
Morris, Garrett M. ;
Finn, Paul W. ;
Sharma, Raman ;
Moretti, Loris ;
Cooper, Richard I. ;
Richards, W. Graham .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2010, 24 (09) :789-801
[8]   Molecular similarity including chirality [J].
Armstrong, M. Stuart ;
Morris, Garrett M. ;
Finn, Paul W. ;
Sharma, Raman ;
Richards, W. Graham .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2009, 28 (04) :368-370
[9]  
Ballester PJ, 2007, J COMPUT CHEM, V28, P1711, DOI 10.1002/JCC.20681
[10]   Ultrafast shape recognition for similarity search in molecular databases [J].
Ballester, Pedro J. ;
Richards, W. Graham .
PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2007, 463 (2081) :1307-1321