ADME properties evaluation in drug discovery: Prediction of plasma protein binding using NSGA-II combining PLS and consensus modeling

被引：26

作者：

Wang, Ning-Ning ^{[1
]}

Deng, Zhen-Ke ^{[1
]}

Huang, Chen ^{[2
]}

Dong, Jie ^{[1
]}

Zhu, Min-Feng ^{[3
]}

Yao, Zhi-Jiang ^{[3
]}

Chen, Alex F. ^{[1
,4
]}

Lu, Ai-Ping ^{[5
]}

Mi, Qi ^{[6
]}

Cao, Dong-Sheng ^{[1
,4
,5
]}

机构：

[1] Cent South Univ, Xiangya Sch Pharmaceut Sci, Changsha 410013, Hunan, Peoples R China

[2] Cent South Univ, Sch Math & Stat, Changsha 410083, Hunan, Peoples R China

[3] Cent South Univ, Xiangya Hosp 3, Dept Haematol, Changsha, Hunan, Peoples R China

[4] Cent South Univ, Xiangya Hosp 3, Ctr Vasc Dis & Translat Med, Changsha 410013, Hunan, Peoples R China

[5] Hong Kong Baptist Univ, Inst Adv Translat Med Bone & Joint Dis, Sch Chinese Med, Hong Kong, Hong Kong, Peoples R China

[6] Univ Pittsburgh, McGowan Inst Regenerat Med, Ctr Inflammat & Regenerat Modeling, Dept Sports Med & Nutr, Pittsburgh, PA USA

来源：

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS | 2017年 / 170卷

基金：

中国国家自然科学基金;

关键词：

Plasma protein binding; ADME; QSAR; NSGA-II; Consensus model; MODIFIED RANDOM FOREST; IN-SILICO PREDICTION; APPLICABILITY DOMAIN; CRYSTAL-STRUCTURE; OUTLIER DETECTION; AFFINITY; GLYCOPROTEIN; VALIDATION; REGRESSION; TOXICITY;

D O I：

10.1016/j.chemolab.2017.09.005

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Plasma protein binding affinity of a drug compound has a strong influence on its pharmacodynamic behavior because it can affect the drug uptake and distribution. In this study, we collected a sizeable dataset consisting of 1830 drug compounds from several accessible sources. A descriptor pool composed of four different types of descriptors (2-D, 3-D, Estate and MACCS) was firstly built and non-dominated sorting genetic algorithm (NSGA-II) combining partial least square (PIS) regression was applied to select important descriptors for model building. Finally, we obtained a consensus model (for five-fold cross-validation: Q(2) = 0.750; RMSE = 16.151) based on five different predictive models built using random forest (RF), support vector machine (SVM), Cubist, Gaussian process (GP), and Boosting. Further, a test set and two external validation datasets were applied to validate its robustness and practicality. For the test set, R-T(2) = 0.787 and RMSET = 14.154; when two external datasets were applied, R-Ex(2) = 0.704 and 0.703, RMSEEX = 18.194 and 17.233 respectively. Additionally, according to OECD principles, y-randomization, Williams plot and scaffold analyses were proposed to validate the reliability and practical application domain of our predictive model. Overall, our consensus model shows a good prediction performance and generalization ability in predicting plasma protein binding (PPB). After analyzing those important descriptors selected by NSGA-II and RF, we concluded that the PPB of a drug compound is mainly related to its lipophilicity, aromatic rings, and partial charge properties. In summary, this study developed a robust and practical consensus model for PPB prediction and it could be used to the distribution evaluation and risk assessment in the early stage of drug development.

引用

页码：84 / 95

页数：12

共 78 条

[1] Effects of digoxin on morbidity and mortality in diastolic heart failure: The ancillary Digitalis Investigation Group trial [J].

Ahmed, Ali ;

Rich, Michael W. ;

Fleg, Jerome L. ;

Zile, Michael R. ;

Young, James B. ;

Kitzman, Dalane W. ;

Love, Thomas E. ;

Aronow, Wilbert S. ;

Adams, Kirkwood F., Jr. ;

Gheorghiade, Mihai .

CIRCULATION, 2006, 114 (05) :397-403

[2] The properties of known drugs .1. Molecular frameworks [J].

Bemis, GW ;

Murcko, MA .

JOURNAL OF MEDICINAL CHEMISTRY, 1996, 39 (15) :2887-2893

[3] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[4] Ensemble partial least squares regression for descriptor selection, outlier detection, applicability domain assessment, and ensemble modeling in QSAR/QSPR modeling [J].

Cao, Dong-Sheng ;

Deng, Zhen-Ke ;

Zhu, Min-Feng ;

Yao, Zhi-Jiang ;

Dong, Jie ;

Zhao, Rui-Gang .

JOURNAL OF CHEMOMETRICS, 2017, 31 (11)

[5] In silico toxicity prediction of chemicals from EPA toxicity database by kernel fusion-based support vector machines [J].

Cao, Dong-Sheng ;

Dong, Jie ;

Wang, Ning-Ning ;

Wen, Ming ;

Deng, Bai-Chuan ;

Zeng, Wen-Bin ;

Xu, Qing-Song ;

Liang, Yi-Zeng ;

Lu, Ai-Ping ;

Chen, Alex F. .

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 146 :494-502

[6] ChemoPy: freely available python']python package for computational biology and chemoinformatics [J].

Cao, Dong-Sheng ;

Xu, Qing-Song ;

Hu, Qian-Nan ;

Liang, Yi-Zeng .

BIOINFORMATICS, 2013, 29 (08) :1092-1094

[7] Computer-aided prediction of toxicity with substructure pattern and random forest [J].

Cao, Dong-Sheng ;

Yang, Yan-Ning ;

Zhao, Jian-Chao ;

Yan, Jun ;

Liu, Shao ;

Hu, Qian-Nan ;

Xu, Qing-Song ;

Liang, Yi-Zeng .

JOURNAL OF CHEMOMETRICS, 2012, 26 (01) :7-15

[8] In silico classification of human maximum recommended daily dose based on modified random forest and substructure fingerprint [J].

Cao, Dong-Sheng ;

Hu, Qian-Nan ;

Xu, Qing-Song ;

Yang, Yan-Ning ;

Zhao, Jian-Chao ;

Lu, Hong-Mei ;

Zhang, Liang-Xiao ;

Liang, Yi-Zeng .

ANALYTICA CHIMICA ACTA, 2011, 692 (1-2) :50-56

[9] Exploring nonlinear relationships in chemical data using kernel-based methods [J].

Cao, Dong-Sheng ;

Liang, Yi-Zeng ;

Xu, Qing-Song ;

Hu, Qian-Nan ;

Zhang, Liang-Xiao ;

Fu, Guang-Hui .

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2011, 107 (01) :106-115

[10] Prediction of aqueous solubility of druglike organic compounds using partial least squares, back-propagation network and support vector machine [J].

Cao, Dong-Sheng ;

Xu, Qing-Song ;

Liang, Yi-Zeng ;

Chen, Xian ;

Li, Hong-Dong .

JOURNAL OF CHEMOMETRICS, 2010, 24 (9-10) :584-595

← 1 2 3 4 5 6 7 8 →