ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling

被引:124
作者
Lei, Tailong [1 ]
Li, Youyong [3 ]
Song, Yunlong [4 ]
Li, Dan [1 ]
Sun, Huiyong [1 ]
Hou, Tingjun [1 ,2 ]
机构
[1] Zhejiang Univ, Coll Pharmaceut Sci, Hangzhou 310058, Zhejiang, Peoples R China
[2] Zhejiang Univ, State Key Lab CAD&CG, Hangzhou 310058, Zhejiang, Peoples R China
[3] Soochow Univ, Inst Funct Nano & Soft Mat FUNSOM, Suzhou 215123, Jiangsu, Peoples R China
[4] Second Mil Med Univ, Sch Pharm, Dept Med Chem, Shanghai 200433, Peoples R China
来源
JOURNAL OF CHEMINFORMATICS | 2016年 / 8卷
基金
美国国家科学基金会;
关键词
IN-SILICO PREDICTION; TRADITIONAL CHINESE MEDICINES; APPLICABILITY DOMAIN; RANDOM FOREST; INTESTINAL-ABSORPTION; BINARY CLASSIFICATION; LIKENESS ANALYSIS; QSAR; REGRESSION; CHEMICALS;
D O I
10.1186/s13321-016-0117-7
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: Determination of acute toxicity, expressed as median lethal dose (LD50), is one of the most important steps in drug discovery pipeline. Because in vivo assays for oral acute toxicity in mammals are time-consuming and costly, there is thus an urgent need to develop in silico prediction models of oral acute toxicity. Results: In this study, based on a comprehensive data set containing 7314 diverse chemicals with rat oral LD50 values, relevance vector machine (RVM) technique was employed to build the regression models for the prediction of oral acute toxicity in rate, which were compared with those built using other six machine learning approaches, including k-nearest-neighbor regression, random forest (RF), support vector machine, local approximate Gaussian process, multilayer perceptron ensemble, and eXtreme gradient boosting. A subset of the original molecular descriptors and structural fingerprints (PubChem or SubFP) was chosen by the Chi squared statistics. The prediction capabilities of individual QSAR models, measured by q(ext)(2) for the test set containing 2376 molecules, ranged from 0.572 to 0.659. Conclusion: Considering the overall prediction accuracy for the test set, RVM with Laplacian kernel and RF were recommended to build in silico models with better predictivity for rat oral acute toxicity. By combining the predictions from individual models, four consensus models were developed, yielding better prediction capabilities for the test set (q(ext)(2) = 0.669-0.689). Finally, some essential descriptors and substructures relevant to oral acute toxicity were identified and analyzed, and they may be served as property or substructure alerts to avoid toxicity. We believe that the best consensus model with high prediction accuracy can be used as a reliable virtual screening tool to filter out compounds with high rat oral acute toxicity.
引用
收藏
页数:19
相关论文
共 79 条
  • [21] Choosing Feature Selection and Learning Algorithms in QSAR
    Eklund, Martin
    Norinder, Ulf
    Boyer, Scott
    Carlsson, Lars
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2014, 54 (03) : 837 - 843
  • [22] Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs
    Eriksson, L
    Jaworska, J
    Worth, AP
    Cronin, MTD
    McDowell, RM
    Gramatica, P
    [J]. ENVIRONMENTAL HEALTH PERSPECTIVES, 2003, 111 (10) : 1361 - 1375
  • [23] Greedy function approximation: A gradient boosting machine
    Friedman, JH
    [J]. ANNALS OF STATISTICS, 2001, 29 (05) : 1189 - 1232
  • [24] A tentative quantitative structure-toxicity relationship study of benzodiazepine drugs
    Funar-Timofei, Simona
    Ionescu, Daniela
    Suzuki, Takahiro
    [J]. TOXICOLOGY IN VITRO, 2010, 24 (01) : 184 - 200
  • [25] Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection
    Gao, Yu-Fei
    Li, Bi-Qing
    Cai, Yu-Dong
    Feng, Kai-Yan
    Li, Zhan-Dong
    Jiang, Yang
    [J]. MOLECULAR BIOSYSTEMS, 2013, 9 (01) : 61 - 69
  • [26] Non-linear QSAR modeling by using multilayer perceptron feedforward neural networks trained by back-propagation
    González-Arjona, D
    López-Pérez, G
    González, AG
    [J]. TALANTA, 2002, 56 (01) : 79 - 90
  • [27] Local Gaussian Process Approximation for Large Computer Experiments
    Gramacy, Robert B.
    Apley, Daniel W.
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2015, 24 (02) : 561 - 578
  • [28] Structure - ADME relationship: still a long way to go?
    Hou, Tingjun
    Wang, Junmei
    [J]. EXPERT OPINION ON DRUG METABOLISM & TOXICOLOGY, 2008, 4 (06) : 759 - 770
  • [29] ADME evaluation in drug discovery. 8. The prediction of human intestinal absorption by a support vector machine
    Hou, Tingjun
    Wang, Junmei
    Li, Youyong
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (06) : 2408 - 2415
  • [30] Recent Developments of In Silico Predictions of Intestinal Absorption and Oral Bioavailability
    Hou, Tingjun
    Li, Youyong
    Zhang, Wei
    Wang, Junmei
    [J]. COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2009, 12 (05) : 497 - 506