Assessing the calibration in toxicological in vitro models with conformal prediction

被引:0
作者
Andrea Morger
Fredrik Svensson
Staffan Arvidsson McShane
Niharika Gauraha
Ulf Norinder
Ola Spjuth
Andrea Volkamer
机构
[1] Charité Universitätsmedizin,In Silico Toxicology and Structural Bioinformatics, Institute of Physiology
[2] Alzheimer’s Research UK UCL Drug Discovery Institute,Department of Pharmaceutical Biosciences and Science for Life Laboratory
[3] Uppsala University,Division of Computational Science and Technology
[4] KTH,Dept. Computer and Systems Sciences
[5] Stockholm University,MTM Research Centre, School of Science and Technology
[6] Örebro University,undefined
来源
Journal of Cheminformatics | / 13卷
关键词
Toxicity prediction; Conformal prediction; Data drifts; Applicability domain; Calibration plots; Tox21 datasets;
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data’s descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy—exchanging the calibration data only—is convenient as it does not require retraining of the underlying model.
引用
收藏
相关论文
共 237 条
  • [1] Yang H(2018)in silico prediction of chemical toxicity for drug design using machine learning methods and structural alerts Front Chem 6 30-946
  • [2] Sun L(2019)Machine learning in drug discovery J Chem Inf Model 59 945-37
  • [3] Li W(2020)An overview of machine learning and big data for drug toxicity evaluation Chem Res Toxicol 33 20-1251
  • [4] Liu G(2016)ToxCast chemical landscape: paving the road to 21st century toxicology Chem Res Toxicol 29 1225-9
  • [5] Tang Y(2014)Profiling of the Tox21 10K compound library for agonists and antagonists of the estrogen receptor alpha signaling pathway Sci Rep 4 1-21,154
  • [6] Klambauer G(2020)The Tox21 10K compound library: collaborative chemistry advancing toxicology Chem Res Toxicol 15 21,136-8
  • [7] Hochreiter S(2014)The eTOX data-sharing project to advance in Silico drug-induced toxicity prediction Int J Mol Sci 9 1-53
  • [8] Rarey M(2018)Development of an infrastructure for the prediction of biological endpoints in industrial environments. Lessons learned at the eTOX project Front Pharmacol 2003 241-2578
  • [9] Vo AH(2003)Rational selection of training and test sets for the development of validated QSAR models J Comput Aided Mol Des 52 2570-251
  • [10] Van Vleet TR(2012)Does rational selection of training and test sets improve the outcome of QSAR modeling? J Chem Inf Model 25 235-1023