共 78 条
ADME properties evaluation in drug discovery: Prediction of plasma protein binding using NSGA-II combining PLS and consensus modeling
被引:26
作者:
Wang, Ning-Ning
[1
]
Deng, Zhen-Ke
[1
]
Huang, Chen
[2
]
Dong, Jie
[1
]
Zhu, Min-Feng
[3
]
Yao, Zhi-Jiang
[3
]
Chen, Alex F.
[1
,4
]
Lu, Ai-Ping
[5
]
Mi, Qi
[6
]
Cao, Dong-Sheng
[1
,4
,5
]
机构:
[1] Cent South Univ, Xiangya Sch Pharmaceut Sci, Changsha 410013, Hunan, Peoples R China
[2] Cent South Univ, Sch Math & Stat, Changsha 410083, Hunan, Peoples R China
[3] Cent South Univ, Xiangya Hosp 3, Dept Haematol, Changsha, Hunan, Peoples R China
[4] Cent South Univ, Xiangya Hosp 3, Ctr Vasc Dis & Translat Med, Changsha 410013, Hunan, Peoples R China
[5] Hong Kong Baptist Univ, Inst Adv Translat Med Bone & Joint Dis, Sch Chinese Med, Hong Kong, Hong Kong, Peoples R China
[6] Univ Pittsburgh, McGowan Inst Regenerat Med, Ctr Inflammat & Regenerat Modeling, Dept Sports Med & Nutr, Pittsburgh, PA USA
基金:
中国国家自然科学基金;
关键词:
Plasma protein binding;
ADME;
QSAR;
NSGA-II;
Consensus model;
MODIFIED RANDOM FOREST;
IN-SILICO PREDICTION;
APPLICABILITY DOMAIN;
CRYSTAL-STRUCTURE;
OUTLIER DETECTION;
AFFINITY;
GLYCOPROTEIN;
VALIDATION;
REGRESSION;
TOXICITY;
D O I:
10.1016/j.chemolab.2017.09.005
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
Plasma protein binding affinity of a drug compound has a strong influence on its pharmacodynamic behavior because it can affect the drug uptake and distribution. In this study, we collected a sizeable dataset consisting of 1830 drug compounds from several accessible sources. A descriptor pool composed of four different types of descriptors (2-D, 3-D, Estate and MACCS) was firstly built and non-dominated sorting genetic algorithm (NSGA-II) combining partial least square (PIS) regression was applied to select important descriptors for model building. Finally, we obtained a consensus model (for five-fold cross-validation: Q(2) = 0.750; RMSE = 16.151) based on five different predictive models built using random forest (RF), support vector machine (SVM), Cubist, Gaussian process (GP), and Boosting. Further, a test set and two external validation datasets were applied to validate its robustness and practicality. For the test set, R-T(2) = 0.787 and RMSET = 14.154; when two external datasets were applied, R-Ex(2) = 0.704 and 0.703, RMSEEX = 18.194 and 17.233 respectively. Additionally, according to OECD principles, y-randomization, Williams plot and scaffold analyses were proposed to validate the reliability and practical application domain of our predictive model. Overall, our consensus model shows a good prediction performance and generalization ability in predicting plasma protein binding (PPB). After analyzing those important descriptors selected by NSGA-II and RF, we concluded that the PPB of a drug compound is mainly related to its lipophilicity, aromatic rings, and partial charge properties. In summary, this study developed a robust and practical consensus model for PPB prediction and it could be used to the distribution evaluation and risk assessment in the early stage of drug development.
引用
收藏
页码:84 / 95
页数:12
相关论文