Cross-validation of component models: A critical look at current methods

被引:251
作者
Bro, R. [1 ]
Kjeldahl, K. [1 ]
Smilde, A. K. [2 ]
Kiers, H. A. L. [3 ]
机构
[1] Univ Copenhagen, Fac Life Sci, Chemometr Grp, DK-1958 Frederiksberg C, Denmark
[2] Swammerdam Inst Life Sci, BDA, NL-1018 WV Amsterdam, Netherlands
[3] Univ Groningen, Heymans Inst DPMG, NL-9712 TS Groningen, Netherlands
关键词
overfitting; PRESS; cross-validation; PCA; rank estimation;
D O I
10.1007/s00216-007-1790-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In regression, cross-validation is an effective and popular approach that is used to decide, for example, the number of underlying features, and to estimate the average prediction error. The basic principle of cross-validation is to leave out part of the data, build a model, and then predict the left-out samples. While such an approach can also be envisioned for component models such as principal component analysis (PCA), most current implementations do not comply with the essential requirement that the predictions should be independent of the entity being predicted. Further, these methods have not been properly reviewed in the literature. In this paper, we review the most commonly used generic PCA cross-validation schemes and assess how well they work in various scenarios.
引用
收藏
页码:1241 / 1251
页数:11
相关论文
共 21 条