Assessing model fit by cross-validation

被引:651
作者
Hawkins, DM [1 ]
Basak, SC
Mills, D
机构
[1] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Nat Resources Res Inst, Duluth, MN 55811 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2003年 / 43卷 / 02期
关键词
D O I
10.1021/ci025626i
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
When QSAR models are fitted, it is important to validate any fitted model-to check that it is plausible that its predictions will carry over to fresh data not used in the model fitting exercise. There are two standard ways of doing this-using a separate hold-out test sample and the computationally much more burdensome leave-one-out cross-validation in which the entire pool of available compounds is used both to fit the model and to assess its validity. We show by theoretical argument and empiric study of a large QSAR data set that when the available sample size is small-in the dozens or scores rather than the hundreds, holding a portion of it back for testing is wasteful, and that it is much better to use cross-validation, but ensure that this is done properly.
引用
收藏
页码:579 / 586
页数:8
相关论文
共 22 条
[1]   RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION [J].
ALLEN, DM .
TECHNOMETRICS, 1974, 16 (01) :125-127
[2]  
[Anonymous], 1982, JACKNIFE BOOTSTRAP O
[3]   Use of topostructural, topochemical, and geometric parameters in the prediction of vapor pressure: A hierarchical QSAR approach [J].
Basak, SC ;
Gute, BD ;
Grunwald, GD .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (04) :651-655
[4]   Quantitative structure-property relationships (QSPRs) for the estimation of vapor pressure: A hierarchical approach using mathematical structural descriptors [J].
Basak, SC ;
Mills, D .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (03) :692-701
[5]   Asymptotic optimality of full cross-validation for selecting linear regression models [J].
Droge, B .
STATISTICS & PROBABILITY LETTERS, 1999, 44 (04) :351-357
[6]   A STATISTICAL VIEW OF SOME CHEMOMETRICS REGRESSION TOOLS [J].
FRANK, IE ;
FRIEDMAN, JH .
TECHNOMETRICS, 1993, 35 (02) :109-135
[7]   TIME-SAVING AND SPACE-SAVING COMPUTER METHODS, RELATED TO MITCHELL DETMAX, FOR FINDING D-OPTIMUM DESIGNS [J].
GALIL, Z ;
KIEFER, J .
TECHNOMETRICS, 1980, 22 (03) :301-313
[8]   PREDICTIVE SAMPLE REUSE METHOD WITH APPLICATIONS [J].
GEISSER, S .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1975, 70 (350) :320-328
[9]   Beware of q2! [J].
Golbraikh, A ;
Tropsha, A .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2002, 20 (04) :269-276
[10]   GENERALIZED CROSS-VALIDATION AS A METHOD FOR CHOOSING A GOOD RIDGE PARAMETER [J].
GOLUB, GH ;
HEATH, M ;
WAHBA, G .
TECHNOMETRICS, 1979, 21 (02) :215-223