A new measure of regression model accuracy that considers applicability domains

被引:14
作者
Kaneko, Hiromasa [1 ]
机构
[1] Meiji Univ, Sch Sci & Technol, Dept Appl Chem, Tama Ku, 1-1-1 Higashi Mita, Kawasaki, Kanagawa 2148571, Japan
关键词
Regression; Measure; Applicability domain; Predictive performance; QSPR; QSAR; QSAR; CLASSIFICATION; VALIDATION; SELECTION;
D O I
10.1016/j.chemolab.2017.09.018
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The coefficient of determination and the root-mean-squared error (RMSE) evaluate regression models for test samples without considering the applicability domains (ADs) of the models. In this study, we propose a new measure for evaluating the predictive performance of regression models that considers their ADs. The purpose is not selecting the best regression model among various competing models, but determining an appropriate model group corresponding to the AD of each model. The proposed measure is the area under coverage and RMSE curve for coverage less than p% (p%-AUCR). It is confirmed that some regression models have global predictive ability and others have local predictive ability, and p%-AUCR is an appropriate indicator for selecting between local and global regression models depending on the coverage and considering the AD. Selecting a regression model for each sample or each chemical structure using p%-AUCR can improve the prediction accuracy of data sets.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 24 条
[1]   Three-dimensional QSAR using the k-nearest neighbor method and its interpretation [J].
Ajmani, S ;
Jadhav, K ;
Kulkarni, SA .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (01) :24-31
[2]   What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models [J].
Babyak, MA .
PSYCHOSOMATIC MEDICINE, 2004, 66 (03) :411-421
[3]   The One-Class Classification Approach to Data Description and to Models Applicability Domain [J].
Baskin, Igor I. ;
Kireeva, Natalia ;
Varnek, Alexandre .
MOLECULAR INFORMATICS, 2010, 29 (8-9) :581-587
[4]  
Bishop C., 2006, Pattern recognition and machine learning, P423
[5]  
Cawley GC, 2010, J MACH LEARN RES, V11, P2079
[6]   A stepwise approach for defining the applicability domain of SAR and QSAR models [J].
Dimitrov, S ;
Dimitrova, G ;
Pavlov, T ;
Dimitrova, N ;
Patlewicz, G ;
Niemela, J ;
Mekenyan, O .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (04) :839-849
[7]   Review of sparse methods in regression and classification with application to chemometrics [J].
Filzmoser, Peter ;
Gschwandtner, Moritz ;
Todorov, Valentin .
JOURNAL OF CHEMOMETRICS, 2012, 26 (3-4) :42-51
[8]   Beware of q2! [J].
Golbraikh, A ;
Tropsha, A .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2002, 20 (04) :269-276
[9]   GENERALIZED CROSS-VALIDATION AS A METHOD FOR CHOOSING A GOOD RIDGE PARAMETER [J].
GOLUB, GH ;
HEATH, M ;
WAHBA, G .
TECHNOMETRICS, 1979, 21 (02) :215-223
[10]   Predicting the Predictability: A Unified Approach to the Applicability Domain Problem of QSAR Models [J].
Horvath, Dragos ;
Marcou, Gilles ;
Alexandre, Varnek .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (07) :1762-1776