Conformal Regression for Quantitative Structure-Activity Relationship Modeling-Quantifying Prediction Uncertainty

被引:44
作者
Svensson, Fredrik [1 ,2 ]
Aniceto, Natalia [1 ]
Norinder, Ulf [3 ,4 ]
Cortes-Ciriano, Isidro [1 ]
Spjuth, Ola [5 ]
Carlsson, Lars [6 ,7 ]
Bender, Andreas [1 ]
机构
[1] Univ Cambridge, Ctr Mol Informat, Dept Chem, Lensfield Rd, Cambridge CB2 1EW, England
[2] IOTA Pharmaceut, St Johns Innovat Ctr, Cowley Rd, Cambridge CB4 0WS, England
[3] Karolinska Inst, Unit Toxicol Sci, Swetox, Forskargatan 20, SE-15136 Sodertalje, Sweden
[4] Stockholm Univ, Dept Comp & Syst Sci, Box 7003, SE-16407 Kista, Sweden
[5] Uppsala Univ, Dept Pharmaceut Biosci, Box 591, SE-75124 Uppsala, Sweden
[6] AstraZeneca, IMED Biotech Unit, Discovery Sci, Quantitat Biol, SE-43183 Molndal, Sweden
[7] Royal Holloway Univ London, Dept Comp Sci, Egham, Surrey, England
基金
欧盟地平线“2020”; 瑞典研究理事会;
关键词
APPLICABILITY DOMAIN; QSAR MODELS; CLASSIFICATION; TRANSPARENT; INHIBITION; VALIDATION; SOLUBILITY; EFFICIENCY; ERROR; SET;
D O I
10.1021/acs.jcim.8b00054
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Making predictions with an associated confidence is highly desirable as it facilitates decision making and resource prioritization. Conformal regression is a machine learning framework that allows the user to define the required confidence and delivers predictions that are guaranteed to be correct to the selected extent. In this study, we apply conformal regression to model molecular properties and bioactivity values and investigate different ways to scale the resultant prediction intervals to create as efficient (i.e., narrow) regressors as possible. Different algorithms to estimate the prediction uncertainty were used to normalize the prediction ranges, and the different approaches were evaluated on 29 publicly available data sets. Our results show that the most efficient conformal regressors are obtained when using the natural exponential of the ensemble standard deviation from the underlying random forest to scale the prediction intervals, but other approaches were almost as efficient. This approach afforded an average prediction range of 1.65 pIC50 units at the 80% confidence level when applied to bioactivity modeling. The choice of nonconformity function has a pronounced impact on the average prediction range with a difference of close to one log unit in bioactivity between the tightest and widest prediction range. Overall, conformal regression is a robust approach to generate bioactivity predictions with associated confidence.
引用
收藏
页码:1132 / 1140
页数:9
相关论文
共 56 条
[1]   A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood [J].
Aniceto, Natalia ;
Freitas, Alex A. ;
Bender, Andreas ;
Ghafourian, Taravat .
JOURNAL OF CHEMINFORMATICS, 2016, 8
[2]   Comparison of approaches for estimating reliability of individual regression predictions [J].
Bosnic, Zoran ;
Kononenko, Igor .
DATA & KNOWLEDGE ENGINEERING, 2008, 67 (03) :504-516
[3]  
Carlsson L., 2014, ARTIFICIAL INTELLIGE
[4]   Modifications to p-Values of Conformal Predictors [J].
Carlsson, Lars ;
Ahlberg, Ernst ;
Bostrom, Henrik ;
Johansson, Ulf ;
Linusson, Henrik .
STATISTICAL LEARNING AND DATA SCIENCES, 2015, 9047 :251-259
[5]   Beyond the Scope of Free-Wilson Analysis: Building Interpretable QSAR Models with Machine Learning Algorithms [J].
Chen, Hongming ;
Carlsson, Lars ;
Eriksson, Mats ;
Varkonyi, Peter ;
Norinder, Ulf ;
Nilsson, Ingemar .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2013, 53 (06) :1324-1336
[6]   QSAR Modeling: Where Have You Been? Where Are You Going To? [J].
Cherkasov, Artem ;
Muratov, Eugene N. ;
Fourches, Denis ;
Varnek, Alexandre ;
Baskin, Igor I. ;
Cronin, Mark ;
Dearden, John ;
Gramatica, Paola ;
Martin, Yvonne C. ;
Todeschini, Roberto ;
Consonni, Viviana ;
Kuz'min, Victor E. ;
Cramer, Richard ;
Benigni, Romualdo ;
Yang, Chihae ;
Rathman, James ;
Terfloth, Lothar ;
Gasteiger, Johann ;
Richard, Ann ;
Tropsha, Alexander .
JOURNAL OF MEDICINAL CHEMISTRY, 2014, 57 (12) :4977-5010
[7]   Benchmarking the Predictive Power of Ligand Efficiency Indices in QSAR [J].
Cortes-Ciriano, Isidro .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2016, 56 (08) :1576-1587
[8]   How Consistent are Publicly Reported Cytotoxicity Data? Large-Scale Statistical Analysis of the Concordance of Public Independent Cytotoxicity Measurements [J].
Cortes-Ciriano, Isidro ;
Bender, Andreas .
CHEMMEDCHEM, 2016, 11 (01) :57-71
[9]   Improved Chemical Structure-Activity Modeling Through Data Augmentation [J].
Cortes-Ciriano, Isidro ;
Bender, Andreas .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2015, 55 (12) :2682-2692
[10]   Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel [J].
Cortes-Ciriano, Isidro ;
van Westen, Gerard J. P. ;
Bouvier, Guillaume ;
Nilges, Michael ;
Overington, John P. ;
Bender, Andreas ;
Malliavin, Therese E. .
BIOINFORMATICS, 2016, 32 (01) :85-95