Enhancing uncertainty quantification in drug discovery with censored regression labels

被引:0
作者
Svensson, Emma [1 ,2 ,3 ]
Friesacher, Hannah Rosa [1 ,4 ]
Winiwarter, Susanne [5 ]
Mervin, Lewis [6 ]
Arany, Adam [4 ]
Engkvist, Ola [1 ,7 ]
机构
[1] AstraZeneca, Mol AI, Discovery Sci, R&D, S-43183 Gothenburg, Sweden
[2] Johannes Kepler Univ Linz, ELLIS Unit Linz, A-4040 Linz, Austria
[3] Johannes Kepler Univ Linz, Inst Machine Learning, A-4040 Linz, Austria
[4] Katholieke Univ Leuven, ESAT STADIUS, B-3000 Leuven, Belgium
[5] AstraZeneca, Drug Metab & Pharmacokinet Res & Early Dev Cardiov, Renal & Metab CVRM, BioPharmaceut R&D, S-43183 Gothenburg, Sweden
[6] AstraZeneca, Mol AI, Discovery Sci, R&D, Cambridge CB2 0AA, England
[7] Chalmers Univ Technol, Dept Comp Sci & Engn, S-41296 Gothenburg, Sweden
来源
ARTIFICIAL INTELLIGENCE IN THE LIFE SCIENCES | 2025年 / 7卷
基金
欧盟地平线“2020”;
关键词
Uncertainty quantification; Censored regression; Temporal evaluation; Distribution shift; Deep learning; Drug discovery; Molecular property prediction; QSAR MODELS; APPLICABILITY; INFORMATION; PREDICTION; DOMAIN;
D O I
10.1016/j.ailsci.2025.100128
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In the early stages of drug discovery, decisions regarding which experiments to pursue can be influenced by computational models for quantitative structure-activity relationships (QSAR). These decisions are critical due to the time-consuming and expensive nature of the experiments. Therefore, it is becoming essential to accurately quantify the uncertainty in machine learning predictions, such that resources can be used optimally and trust in the models improves. While computational methods for QSAR modeling often suffer from limited data and sparse experimental observations, additional information can exist in the form of censored labels that provide thresholds rather than precise values of observations. However, the standard approaches that quantify uncertainty in machine learning cannot fully utilize censored labels. In this work, we adapt ensemble-based, Bayesian, and Gaussian models with tools to learn from censored labels by using the Tobit model from survival analysis. Our results demonstrate that despite the partial information available in censored labels, they are essential to reliably estimate uncertainties in real pharmaceutical settings where approximately one-third or more of experimental labels are censored.
引用
收藏
页数:16
相关论文
共 69 条
  • [11] Sheridan R.P., Three Useful Dimensions for Domain Applicability in QSAR Models Using Random Forest, J Chem Inf Model, 52, 3, pp. 814-823, (2012)
  • [12] Gal Y., Ghahramani Z., Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, International conference on machine learning, pp. 1050-1059, (2016)
  • [13] Lakshminarayanan B., Pritzel A., Blundell C., Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles, Advances in neural information processing systems, 30, (2017)
  • [14] Scalia G., Grambow C.A., Pernici B., Li Y.P., Green W.H., Evaluating Scalable Uncertainty Estimation Methods for Deep Learning-Based Molecular Property Prediction, J Chem Inf Model, 60, 6, pp. 2697-2717, (2020)
  • [15] Sheridan R.P., Feuston B.P., Maiorov V.N., Kearsley S.K., Similarity to Molecules in the Training Set is a Good Discriminator for Prediction Accuracy in QSAR, J Chem Inf Comput Sci, 44, 6, pp. 1912-1928, (2004)
  • [16] Berenger F., Yamanishi Y., A Distance-Based Boolean Applicability Domain for Classification of High Throughput Screening Data, J Chem Inf Model, 59, 1, pp. 463-476, (2018)
  • [17] Bishop C.M., Mixture Density Networks, (1994)
  • [18] Nix D.A., Weigend A.S., Estimating the Mean and Variance of the Target Probability Distribution, Proceedings of 1994 IEEE international conference on neural networks, ICNN’94, 1, pp. 55-60, (1994)
  • [19] Choi S., Lee K., Lim S., Oh S., Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling, 2018 IEEE international conference on robotics and automation, ICRA, pp. 6915-6922, (2018)
  • [20] Amini A., Schwarting W., Soleimany A., Rus D., Deep Evidential Regression, Advances in neural information processing systems, 33, pp. 14927-14937, (2020)