Estimation of influential points in any data set from coefficient of determination and its leave-one-out cross-validated counterpart

被引:23
作者
Toth, Gergely [1 ]
Bodai, Zsolt [1 ]
Heberger, Karoly [2 ]
机构
[1] Eotvos Lorand Univ, Inst Chem, H-1117 Budapest, Hungary
[2] Hungarian Acad Sci, Res Ctr Nat Sci, Inst Mat & Environm Chem, H-1025 Budapest, Hungary
关键词
Coefficient of determination; Leave-one-out cross-validation; Influence analysis; Quantitative structure activity relationships; Prediction; Training set; QSAR MODELS; 3D-QSAR;
D O I
10.1007/s10822-013-9680-4
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Coefficient of determination (R (2)) and its leave-one-out cross-validated analogue (denoted by Q (2) or R (cv) (2) ) are the most frequantly published values to characterize the predictive performance of models. In this article we use R (2) and Q (2) in a reversed aspect to determine uncommon points, i.e. influential points in any data sets. The term (1 - Q (2))/(1 - R (2)) corresponds to the ratio of predictive residual sum of squares and the residual sum of squares. The ratio correlates to the number of influential points in experimental and random data sets. We propose an (approximate) F test on (1 - Q (2))/(1 - R (2)) term to quickly pre-estimate the presence of influential points in training sets of models. The test is founded upon the routinely calculated Q (2) and R (2) values and warns the model builders to verify the training set, to perform influence analysis or even to change to robust modeling.
引用
收藏
页码:837 / 844
页数:8
相关论文
共 26 条
[1]  
[Anonymous], 1982, RESIDUALS INFLUENCE
[2]  
Bagheri A, 2010, APPL MATH SCI, V4, P1367
[3]  
Belsley D.A., 2005, REGRESSION DIAGNOSTI
[4]  
Bevington P. R., 1969, DATA REDUCTION ERROR
[5]  
Chatterjee S., 1986, Stat. Sci., V1, P379, DOI [10.1214/ss/1177013622, DOI 10.1214/SS/1177013622]
[6]   Real External Predictivity of QSAR Models: How To Evaluate It? Comparison of Different Validation Criteria and Proposal of Using the Concordance Correlation Coefficient [J].
Chirico, Nicola ;
Gramatica, Paola .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (09) :2320-2335
[7]   Evaluation of model predictive ability by external validation techniques [J].
Consonni, Viviana ;
Ballabio, Davide ;
Todeschini, Roberto .
JOURNAL OF CHEMOMETRICS, 2010, 24 (3-4) :194-201
[8]   Comments on the Definition of the Q2 Parameter for QSAR Validation [J].
Consonni, Viviana ;
Ballabio, Davide ;
Todeschini, Roberto .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (07) :1669-1678
[9]   Pushing the boundaries of 3D-QSAR [J].
Cramer, Richard D. ;
Wendt, Bernd .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2007, 21 (1-3) :23-32
[10]  
Dearden JC, 2004, J PHARM PHARMACOL, V56, pS82