Development of Ligand-based Big Data Deep Neural Network Models for Virtual Screening of Large Compound Libraries

被引:16
作者
Xiao, Tao [1 ,2 ]
Qi, Xingxing [2 ]
Chen, Yuzong [3 ,4 ]
Jiang, Yuyang [2 ,5 ]
机构
[1] Tsinghua Univ, Dept Chem, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Grad Sch Shenzhen, State Key Lab Chem Oncogen, Shenzhen 518055, Peoples R China
[3] Natl Univ Singapore, Dept Pharm, Bioinformat & Drug Design Grp, Singapore 117543, Singapore
[4] Shenzhen Kivita Innovat Drug Inst, Shenzhen 518055, Peoples R China
[5] Tsinghua Univ, Sch Pharmaceut Sci, Beijing 100084, Peoples R China
关键词
deep learning; machine learning; ligand-based virtual screening; large compound library; EGFR; SUPPORT VECTOR MACHINES; K-NEAREST NEIGHBOR; KINASE INHIBITORS; CLASSIFICATION; PREDICTION; IDENTIFICATION; ENRICHMENT; DISCOVERY; SYSTEMS; TOOLS;
D O I
10.1002/minf.201800031
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
High-performance ligand-based virtual screening (VS) models have been developed using various computational methods, including the deep neural network (DNN) method. There are high expectations for exploration of the advanced capabilities of DNN to improve VS performance, and this capability has been optimally achieved using large data training datasets. However, their ability to screen large compound libraries has not been evaluated. There is a need for developing and evaluating ligand-based large data DNN VS models for large compound libraries. In this study, we developed ligand-based large data DNN VS models for inhibitors of six anticancer targets using 0.5 M training compounds. The developed VS models were evaluated by 10-fold cross-validation, achieving 77.9-97.8 % sensitivity, 99.9-100 % specificity, 0.82-0.98 Matthews correlation coefficient and 0.98-0.99 area under the curve, outperforming random forest models. Moreover, DNN VS models developed by pre-2015 inhibitors identified 50 % of post-2015 inhibitors with a 0.01-0.09 % false positive rate in screening 89 M PubChem compounds, also outperforming previous models. Experimental assays of the selected virtual hits of the EGFR inhibitor model led to reasonable novel structures of EGFR inhibitors. Our results confirmed the usefulness of the large data DNN model as a ligand-based VS tool to screen large compound libraries.
引用
收藏
页数:13
相关论文
共 67 条
[21]   Extraction and visualization of potential pharmacophore points using support vector machines: Application to ligand-based virtual screening for COX-2 inhibitors [J].
Franke, L ;
Byvatov, E ;
Werz, O ;
Steinhilber, D ;
Schneider, P ;
Schneider, G .
JOURNAL OF MEDICINAL CHEMISTRY, 2005, 48 (22) :6997-7004
[22]   Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and Laplacian-modified naive Bayesian classifiers [J].
Glick, M ;
Jenkins, JL ;
Nettles, JH ;
Hitchings, H ;
Davies, JW .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (01) :193-200
[23]  
Glorot X., 2011, P 14 INT C ART INT S, P315, DOI DOI 10.1002/ECS2.1832
[24]  
Gomez-Bombarelli R., 2016, ARXIV161002415 CORN
[25]   Interaction prediction in structure-based virtual screening using deep learning [J].
Gonczarek, Adam ;
Tomczak, Jakub M. ;
Zareba, Szymon ;
Kaczmar, Joanna ;
Dabrowski, Piotr ;
Walczak, Michal J. .
COMPUTERS IN BIOLOGY AND MEDICINE, 2018, 100 :253-258
[26]  
Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
[27]   A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor [J].
Han, L. Y. ;
Ma, X. H. ;
Lin, H. H. ;
Jia, J. ;
Zhu, F. ;
Xue, Y. ;
Li, Z. R. ;
Cao, Z. W. ;
Ji, Z. L. ;
Chen, Y. Z. .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2008, 26 (08) :1276-1286
[28]   New methods for ligand-based virtual screening: Use of data fusion and machine learning to enhance the effectiveness of similarity searching [J].
Hert, J ;
Willett, P ;
Wilton, DJ ;
Acklin, P ;
Azzaoui, K ;
Jacoby, E ;
Schuffenhauer, A .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (02) :462-470
[29]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[30]   A fast learning algorithm for deep belief nets [J].
Hinton, Geoffrey E. ;
Osindero, Simon ;
Teh, Yee-Whye .
NEURAL COMPUTATION, 2006, 18 (07) :1527-1554