In silico classification of human maximum recommended daily dose based on modified random forest and substructure fingerprint

被引:34
作者
Cao, Dong-Sheng [1 ]
Hu, Qian-Nan [2 ]
Xu, Qing-Song [3 ]
Yang, Yan-Ning [1 ]
Zhao, Jian-Chao [1 ]
Lu, Hong-Mei [1 ]
Zhang, Liang-Xiao [1 ]
Liang, Yi-Zeng [1 ]
机构
[1] Cent S Univ, Res Ctr Modernizat Tradit Chinese Med, Changsha 410083, Hunan, Peoples R China
[2] Wuhan Univ, Syst Drug Design Lab, Coll Pharm, Wuhan 430071, Peoples R China
[3] Cent S Univ, Sch Math Sci & Comp Technol, Changsha 410083, Hunan, Peoples R China
基金
中国博士后科学基金;
关键词
Maximum recommended daily dose (MRDD); Drug toxicity; Modified random forest; Substructure fingerprint; Machine learning; REGRESSION;
D O I
10.1016/j.aca.2011.02.010
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
A modified random forest (RF) algorithm, as a novel machine learning technique, was developed to estimate the maximum recommended daily dose (MRDD) of a large and diverse pharmaceutical dataset for phase I human trials using substructure fingerprint descriptors calculated from simple molecular structure alone. This type of novel molecular descriptors encodes molecular structure in a series of binary bits that represent the presence or absence of particular substructures in the molecule and thereby can accurately and directly depict a series of local information hidden in this molecule. Two model validation approaches, 5-fold cross-validation and an independent validation set, were used for assessing the prediction capability of our models. The results obtained in this study indicate that the modified RF gave prediction accuracy of 80.45%, sensitivity of 75.08%, specificity of 84.85% for 5-fold cross-validation, and prediction accuracy of 80.5%, sensitivity of 76.47%, specificity of 83.48% for independent validation set, respectively, which are as a whole better than those by the original RF. At the same time, the important substructure fingerprints, recognized by the RF technique, gave some insights into the structure features related to toxicity of pharmaceuticals. This could help provide intuitive understanding for medicinal chemists. (C) 2011 Published by Elsevier B.V.
引用
收藏
页码:50 / 56
页数:7
相关论文
共 16 条
[1]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[2]  
Breiman L., OUT BAG ESTIMATION
[3]  
CRONIN MTD, 2004, LIVINGSTONE PREDICTI
[4]  
DEKANT W, 2006, REGULATORY TOXICOLOG, V56, P135
[5]   An introduction to ROC analysis [J].
Fawcett, Tom .
PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874
[6]   NEURAL NETWORK ENSEMBLES [J].
HANSEN, LK ;
SALAMON, P .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (10) :993-1001
[7]  
Jahnke W., 2006, FRAGMENT BASED APPRO
[8]   Application of the random forest method in studies of local lymph node assay based skin sensitization data [J].
Li, SQ ;
Fedorowicz, A ;
Singh, H ;
Soderholm, SC .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (04) :952-964
[9]   Molecular similarity and diversity in chemoinformatics: From theory to applications [J].
Maldonado, AG ;
Doucet, JP ;
Petitjean, M ;
Fan, BT .
MOLECULAR DIVERSITY, 2006, 10 (01) :39-79
[10]  
MARRER E, 2003, TOXICOLOGY APPL PHAR, V243, P167