ADME properties evaluation in drug discovery: Prediction of plasma protein binding using NSGA-II combining PLS and consensus modeling

被引:26
作者
Wang, Ning-Ning [1 ]
Deng, Zhen-Ke [1 ]
Huang, Chen [2 ]
Dong, Jie [1 ]
Zhu, Min-Feng [3 ]
Yao, Zhi-Jiang [3 ]
Chen, Alex F. [1 ,4 ]
Lu, Ai-Ping [5 ]
Mi, Qi [6 ]
Cao, Dong-Sheng [1 ,4 ,5 ]
机构
[1] Cent South Univ, Xiangya Sch Pharmaceut Sci, Changsha 410013, Hunan, Peoples R China
[2] Cent South Univ, Sch Math & Stat, Changsha 410083, Hunan, Peoples R China
[3] Cent South Univ, Xiangya Hosp 3, Dept Haematol, Changsha, Hunan, Peoples R China
[4] Cent South Univ, Xiangya Hosp 3, Ctr Vasc Dis & Translat Med, Changsha 410013, Hunan, Peoples R China
[5] Hong Kong Baptist Univ, Inst Adv Translat Med Bone & Joint Dis, Sch Chinese Med, Hong Kong, Hong Kong, Peoples R China
[6] Univ Pittsburgh, McGowan Inst Regenerat Med, Ctr Inflammat & Regenerat Modeling, Dept Sports Med & Nutr, Pittsburgh, PA USA
基金
中国国家自然科学基金;
关键词
Plasma protein binding; ADME; QSAR; NSGA-II; Consensus model; MODIFIED RANDOM FOREST; IN-SILICO PREDICTION; APPLICABILITY DOMAIN; CRYSTAL-STRUCTURE; OUTLIER DETECTION; AFFINITY; GLYCOPROTEIN; VALIDATION; REGRESSION; TOXICITY;
D O I
10.1016/j.chemolab.2017.09.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Plasma protein binding affinity of a drug compound has a strong influence on its pharmacodynamic behavior because it can affect the drug uptake and distribution. In this study, we collected a sizeable dataset consisting of 1830 drug compounds from several accessible sources. A descriptor pool composed of four different types of descriptors (2-D, 3-D, Estate and MACCS) was firstly built and non-dominated sorting genetic algorithm (NSGA-II) combining partial least square (PIS) regression was applied to select important descriptors for model building. Finally, we obtained a consensus model (for five-fold cross-validation: Q(2) = 0.750; RMSE = 16.151) based on five different predictive models built using random forest (RF), support vector machine (SVM), Cubist, Gaussian process (GP), and Boosting. Further, a test set and two external validation datasets were applied to validate its robustness and practicality. For the test set, R-T(2) = 0.787 and RMSET = 14.154; when two external datasets were applied, R-Ex(2) = 0.704 and 0.703, RMSEEX = 18.194 and 17.233 respectively. Additionally, according to OECD principles, y-randomization, Williams plot and scaffold analyses were proposed to validate the reliability and practical application domain of our predictive model. Overall, our consensus model shows a good prediction performance and generalization ability in predicting plasma protein binding (PPB). After analyzing those important descriptors selected by NSGA-II and RF, we concluded that the PPB of a drug compound is mainly related to its lipophilicity, aromatic rings, and partial charge properties. In summary, this study developed a robust and practical consensus model for PPB prediction and it could be used to the distribution evaluation and risk assessment in the early stage of drug development.
引用
收藏
页码:84 / 95
页数:12
相关论文
共 78 条
[1]   Effects of digoxin on morbidity and mortality in diastolic heart failure: The ancillary Digitalis Investigation Group trial [J].
Ahmed, Ali ;
Rich, Michael W. ;
Fleg, Jerome L. ;
Zile, Michael R. ;
Young, James B. ;
Kitzman, Dalane W. ;
Love, Thomas E. ;
Aronow, Wilbert S. ;
Adams, Kirkwood F., Jr. ;
Gheorghiade, Mihai .
CIRCULATION, 2006, 114 (05) :397-403
[2]   The properties of known drugs .1. Molecular frameworks [J].
Bemis, GW ;
Murcko, MA .
JOURNAL OF MEDICINAL CHEMISTRY, 1996, 39 (15) :2887-2893
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Ensemble partial least squares regression for descriptor selection, outlier detection, applicability domain assessment, and ensemble modeling in QSAR/QSPR modeling [J].
Cao, Dong-Sheng ;
Deng, Zhen-Ke ;
Zhu, Min-Feng ;
Yao, Zhi-Jiang ;
Dong, Jie ;
Zhao, Rui-Gang .
JOURNAL OF CHEMOMETRICS, 2017, 31 (11)
[5]   In silico toxicity prediction of chemicals from EPA toxicity database by kernel fusion-based support vector machines [J].
Cao, Dong-Sheng ;
Dong, Jie ;
Wang, Ning-Ning ;
Wen, Ming ;
Deng, Bai-Chuan ;
Zeng, Wen-Bin ;
Xu, Qing-Song ;
Liang, Yi-Zeng ;
Lu, Ai-Ping ;
Chen, Alex F. .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 146 :494-502
[6]   ChemoPy: freely available python']python package for computational biology and chemoinformatics [J].
Cao, Dong-Sheng ;
Xu, Qing-Song ;
Hu, Qian-Nan ;
Liang, Yi-Zeng .
BIOINFORMATICS, 2013, 29 (08) :1092-1094
[7]   Computer-aided prediction of toxicity with substructure pattern and random forest [J].
Cao, Dong-Sheng ;
Yang, Yan-Ning ;
Zhao, Jian-Chao ;
Yan, Jun ;
Liu, Shao ;
Hu, Qian-Nan ;
Xu, Qing-Song ;
Liang, Yi-Zeng .
JOURNAL OF CHEMOMETRICS, 2012, 26 (01) :7-15
[8]   In silico classification of human maximum recommended daily dose based on modified random forest and substructure fingerprint [J].
Cao, Dong-Sheng ;
Hu, Qian-Nan ;
Xu, Qing-Song ;
Yang, Yan-Ning ;
Zhao, Jian-Chao ;
Lu, Hong-Mei ;
Zhang, Liang-Xiao ;
Liang, Yi-Zeng .
ANALYTICA CHIMICA ACTA, 2011, 692 (1-2) :50-56
[9]   Exploring nonlinear relationships in chemical data using kernel-based methods [J].
Cao, Dong-Sheng ;
Liang, Yi-Zeng ;
Xu, Qing-Song ;
Hu, Qian-Nan ;
Zhang, Liang-Xiao ;
Fu, Guang-Hui .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2011, 107 (01) :106-115
[10]   Prediction of aqueous solubility of druglike organic compounds using partial least squares, back-propagation network and support vector machine [J].
Cao, Dong-Sheng ;
Xu, Qing-Song ;
Liang, Yi-Zeng ;
Chen, Xian ;
Li, Hong-Dong .
JOURNAL OF CHEMOMETRICS, 2010, 24 (9-10) :584-595