A combined drug discovery strategy based on machine learning and molecular docking

被引:24
作者
Zhang, Yanmin [1 ]
Wang, Yuchen [1 ]
Zhou, Weineng [1 ]
Fan, Yuanrong [1 ]
Zhao, Junnan [1 ]
Zhu, Lu [1 ]
Lu, Shuai [1 ]
Lu, Tao [1 ,2 ]
Chen, Yadong [1 ]
Liu, Haichun [1 ]
机构
[1] China Pharmaceut Univ, Sch Sci, Lab Mol Design & Drug Discovery, Nanjing, Jiangsu, Peoples R China
[2] China Pharmaceut Univ, State Key Lab Nat Med, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
ACC inhibitors; extremely randomized trees; machine learning; molecular docking; RANDOM FOREST; ACETYL-COENZYME; CARBOXYLASE; INHIBITORS; QSAR; CLASSIFICATION; PREDICTION; CHEMISTRY; CANCER;
D O I
10.1111/cbdd.13494
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Data mining methods based on machine learning play an increasingly important role in drug design and discovery. In the current work, eight machine learning methods including decision trees, k-Nearest neighbor, support vector machines, random forests, extremely randomized trees, AdaBoost, gradient boosting trees, and XGBoost were evaluated comprehensively through a case study of ACC inhibitor data sets. Internal and external data sets were employed for cross-validation of the eight machine learning methods. Results showed that the extremely randomized trees model performed best and was adopted as the first step of virtual screening. Together with structure-based virtual screening in the second step, this combined strategy obtained desirable results. This work indicates that the combination of machine learning methods with traditional structure-based virtual screening can effectively strengthen the ability in finding potential hits from large compound database for a given target.
引用
收藏
页码:685 / 699
页数:15
相关论文
共 54 条
[1]   Proteus: a random forest classifier to predict disorder-to-order transitioning binding regions in intrinsically disordered proteins [J].
Basu, Sankar ;
Soderquist, Fredrik ;
Wallner, Bjorn .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2017, 31 (05) :453-466
[2]   Assessing the performance of OMEGA with respect to retrieving bioactive conformations [J].
Boström, J ;
Greenwood, JR ;
Gottfries, J .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2003, 21 (05) :449-462
[3]   Recent Advances in the Development of Acetyl-CoA Carboxylase (ACC) Inhibitors for the Treatment of Metabolic Disease [J].
Bourbeau, Matthew P. ;
Bartberger, Michael D. .
JOURNAL OF MEDICINAL CHEMISTRY, 2015, 58 (02) :525-536
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   Multiclassification Prediction of Enzymatic Reactions for Oxidoreductases and Hydrolases Using Reaction Fingerprints and Machine Learning Methods [J].
Cai, Yingchun ;
Yang, Hongbin ;
Li, Weihua ;
Liu, Guixia ;
Lee, Philip W. ;
Tang, Yun .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2018, 58 (06) :1169-1181
[6]  
Chem A., 2010, ANAL CHEM, V58, P117
[7]   Comparison of Random Forest and Pipeline Pilot Naive Bayes in Prospective QSAR Predictions [J].
Chen, Bin ;
Sheridan, Robert P. ;
Hornak, Viktor ;
Voigt, Johannes H. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2012, 52 (03) :792-803
[8]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[9]   Robust optimization of SVM hyperparameters in the classification of bioactive compounds [J].
Czarnecki, Wojciech M. ;
Podlewska, Sabina ;
Bojarski, Andrzej J. .
JOURNAL OF CHEMINFORMATICS, 2015, 7
[10]   IChemPIC: A Random Forest Classifier of Biological and Crystallographic Protein-Protein Interfaces [J].
Da Silva, Franck ;
Desaphy, Jeremy ;
Bret, Guillaume ;
Rognan, Didier .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2015, 55 (09) :2005-2014