机构:Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
Basak, SC
Mills, D
论文数: 0引用数: 0
h-index: 0
机构:Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
Mills, D
机构:
[1] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Nat Resources Res Inst, Duluth, MN 55811 USA
来源:
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
|
2003年
/
43卷
/
02期
关键词:
D O I:
10.1021/ci025626i
中图分类号:
O6 [化学];
学科分类号:
0703 ;
摘要:
When QSAR models are fitted, it is important to validate any fitted model-to check that it is plausible that its predictions will carry over to fresh data not used in the model fitting exercise. There are two standard ways of doing this-using a separate hold-out test sample and the computationally much more burdensome leave-one-out cross-validation in which the entire pool of available compounds is used both to fit the model and to assess its validity. We show by theoretical argument and empiric study of a large QSAR data set that when the available sample size is small-in the dozens or scores rather than the hundreds, holding a portion of it back for testing is wasteful, and that it is much better to use cross-validation, but ensure that this is done properly.