Development and Evaluation of Conformal Prediction Methods for Quantitative Structure-Activity Relationship

被引:3
作者
Xu, Yuting [1 ]
Liaw, Andy [1 ]
Sheridan, Robert P. [2 ]
Svetnik, Vladimir [1 ]
机构
[1] Merck & Co Inc, Early Dev Stat, Rahway, NJ 07065 USA
[2] Merck & Co Inc, Modeling & Informat, Rahway, NJ 07033 USA
关键词
APPLICABILITY DOMAIN; UNCERTAINTY QUANTIFICATION; COMPOUND CLASSIFICATION; TRAINING SET; QSAR; TOOL;
D O I
10.1021/acsomega.4c02017
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The quantitative structure-activity relationship (QSAR) regression model is a commonly used technique for predicting the biological activities of compounds using their molecular descriptors. Besides accurate activity estimation, obtaining a prediction uncertainty metric like a prediction interval is highly desirable. Quantifying prediction uncertainty is an active research area in statistical and machine learning (ML), but the implementation for QSAR remains challenging. However, most ML algorithms with high predictive performance require add-on companions for estimating the uncertainty of their prediction. Conformal prediction (CP) is a promising approach as its main components are agnostic to the prediction modes, and it produces valid prediction intervals under weak assumptions on the data distribution. We proposed computationally efficient CP algorithms tailored to the most widely used ML models, including random forests, deep neural networks, and gradient boosting. The algorithms use a novel approach to the derivation of nonconformity scores from the estimates of prediction uncertainty generated by the ensembles of point predictions. The validity and efficiency of proposed algorithms are demonstrated on a diverse collection of QSAR data sets as well as simulation studies. The provided software implementing our algorithms can be used as stand-alone or easily incorporated into other ML software packages for QSAR modeling.
引用
收藏
页码:29478 / 29490
页数:13
相关论文
共 75 条
[1]   A review of uncertainty quantification in deep learning: Techniques, applications and challenges [J].
Abdar, Moloud ;
Pourpanah, Farhad ;
Hussain, Sadiq ;
Rezazadegan, Dana ;
Liu, Li ;
Ghavamzadeh, Mohammad ;
Fieguth, Paul ;
Cao, Xiaochun ;
Khosravi, Abbas ;
Acharya, U. Rajendra ;
Makarenkov, Vladimir ;
Nahavandi, Saeid .
INFORMATION FUSION, 2021, 76 :243-297
[2]   Predicting With Confidence: Using Conformal Prediction in Drug Discovery [J].
Alvarsson, Jonathan ;
McShane, Staffan Arvidsson ;
Norinder, Ulf ;
Spjuth, Ola .
JOURNAL OF PHARMACEUTICAL SCIENCES, 2021, 110 (01) :42-49
[3]  
[Anonymous], 2014, Proceedings
[4]  
[Anonymous], 2006, RDKit: Open -source cheminformatics'
[5]  
[Anonymous], 2013, Series Title: IFIP Advances in Information and Communication Technology, DOI [DOI 10.1007/978-3-642-41142-736, 10.1007/978-3-642-33412-2_17, DOI 10.1007/978-3-642-33412-2_17]
[6]   The limits of distribution-free conditional predictive inference [J].
Barber, Rina Foygel ;
Candes, Emmanuel J. ;
Ramdas, Aaditya ;
Tibshirani, Ryan J. .
INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2021, 10 (02) :455-482
[7]  
Bastani Osbert, 2022, Advances in Neural Information Processing Systems
[8]   Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery [J].
Bosc, Nicolas ;
Atkinson, Francis ;
Felix, Eloy ;
Gaulton, Anna ;
Hersey, Anne ;
Leach, Andrew R. .
JOURNAL OF CHEMINFORMATICS, 2019, 11 (1)
[9]  
Boström H, 2020, PR MACH LEARN RES, V128, P114
[10]  
Bostrom Henrik, 2022, P MACHINE LEARNING R, V179