In Silico Prediction of Blood-Brain Barrier Permeability of Compounds by Machine Learning and Resampling Methods

被引:139
作者
Wang, Zhuang [1 ]
Yang, Hongbin [1 ]
Wu, Zengrui [1 ]
Wang, Tianduanyi [1 ]
Li, Weihua [1 ]
Tang, Yun [1 ]
Liu, Guixia [1 ]
机构
[1] East China Univ Sci & Technol, Sch Pharm, Shanghai Key Lab New Drug Design, Shanghai 200237, Peoples R China
基金
中国国家自然科学基金;
关键词
blood-brain barrier; imbalanced data; machine learning; QSAR models; resampling methods; DRUG DISCOVERY; APPLICABILITY DOMAIN; ADME EVALUATION; CLASSIFICATION; PENETRATION; MODELS; AREA; TOOL;
D O I
10.1002/cmdc.201800533
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The blood-brain barrier (BBB) as a part of absorption protects the central nervous system by separating the brain tissue from the bloodstream. In recent years, BBB permeability has become a critical issue in chemical ADMET prediction, but almost all models were built using imbalanced data sets, which caused a high false-positive rate. Therefore, we tried to solve the problem of biased data sets and built a reliable classification model with 2358 compounds. Machine learning and resampling methods were used simultaneously for the refinement of models with both 2D molecular descriptors and molecular fingerprints to represent the chemicals. Through a series of evaluation, we realized that resampling methods such as Synthetic Minority Oversampling Technique (SMOTE) and SMOTE+edited nearest neighbor could effectively solve the problem of imbalanced data sets and that MACCS fingerprint combined with support vector machine performed the best. After the final construction of a consensus model, the overall accuracy rate was increased to 0.966 for the final external data set. Also, the accuracy rate of the model for the test set was 0.919, with an excellent balanced capacity of 0.925 (sensitivity) to predict BBB-positive compounds and of 0.899 (specificity) to predict BBB-negative compounds. Compared with other BBB classification models, our models reduced the rate of false positives and were more robust in prediction of BBB-positive as well as BBB-negative compounds, which would be quite helpful in early drug discovery.
引用
收藏
页码:2189 / 2201
页数:13
相关论文
共 50 条
[1]   Astrocyte-endothelial interactions at the blood-brain barrier [J].
Abbott, NJ ;
Rönnbäck, L ;
Hansson, E .
NATURE REVIEWS NEUROSCIENCE, 2006, 7 (01) :41-53
[2]  
Accelrys Software Inc., DISC STUD MOD ENV RE
[3]  
[Anonymous], 2015, MAESTR 10 2 010
[4]  
[Anonymous], 2008, NEURAL NETWORKS 2008
[5]   The blood-brain barrier: an overview - Structure, regulation, and clinical implications [J].
Ballabh, P ;
Braun, A ;
Nedergaard, M .
NEUROBIOLOGY OF DISEASE, 2004, 16 (01) :1-13
[6]   Improved shrunken centroid classifiers for high-dimensional class-imbalanced data [J].
Blagus, Rok ;
Lusa, Lara .
BMC BIOINFORMATICS, 2013, 14
[7]   SMOTE for high-dimensional class-imbalanced data [J].
Blagus, Rok ;
Lusa, Lara .
BMC BIOINFORMATICS, 2013, 14
[8]   A Method to Predict Blood-Brain Barrier Permeability of Drug-Like Compounds Using Molecular Dynamics Simulations [J].
Carpenter, Timothy S. ;
Kirshner, Daniel A. ;
Lau, Edmond Y. ;
Wong, Sergio E. ;
Nilmeier, Jerome P. ;
Lightstone, Felice C. .
BIOPHYSICAL JOURNAL, 2014, 107 (03) :630-641
[9]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[10]   admetSAR: A Comprehensive Source and Free Tool for Assessment of Chemical ADMET Properties [J].
Cheng, Feixiong ;
Li, Weihua ;
Zhou, Yadi ;
Shen, Jie ;
Wu, Zengrui ;
Liu, Guixia ;
Lee, Philip W. ;
Tang, Yun .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2012, 52 (11) :3099-3105