Cross-validation of component models: A critical look at current methods

被引：254

作者：

Bro, R. ^{[1
]}

Kjeldahl, K. ^{[1
]}

Smilde, A. K. ^{[2
]}

Kiers, H. A. L. ^{[3
]}

机构：

[1] Univ Copenhagen, Fac Life Sci, Chemometr Grp, DK-1958 Frederiksberg C, Denmark

[2] Swammerdam Inst Life Sci, BDA, NL-1018 WV Amsterdam, Netherlands

[3] Univ Groningen, Heymans Inst DPMG, NL-9712 TS Groningen, Netherlands

来源：

ANALYTICAL AND BIOANALYTICAL CHEMISTRY | 2008年 / 390卷 / 05期

关键词：

overfitting; PRESS; cross-validation; PCA; rank estimation;

D O I：

10.1007/s00216-007-1790-1

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

In regression, cross-validation is an effective and popular approach that is used to decide, for example, the number of underlying features, and to estimate the average prediction error. The basic principle of cross-validation is to leave out part of the data, build a model, and then predict the left-out samples. While such an approach can also be envisioned for component models such as principal component analysis (PCA), most current implementations do not comply with the essential requirement that the predictions should be independent of the entity being predicted. Further, these methods have not been properly reviewed in the literature. In this paper, we review the most commonly used generic PCA cross-validation schemes and assess how well they work in various scenarios.

引用

页码：1241 / 1251

页数：11

共 21 条

[1] RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION [J].

ALLEN, DM .

TECHNOMETRICS, 1974, 16 (01) :125-127

[2]

[Anonymous], 1998, Multi-way Analysis in the Food Industry: Models, Algorithms and Applications

[3]

[Anonymous], 1988, Journal of Chemometrics

[4]

[Anonymous], 1989, MULTIVARIATE CALIBRA

[5] Determination of rate constants in second-order kinetics using UV-visible spectroscopy [J].

Bijlsma, S ;

Boelens, HFM ;

Smilde, AR .

APPLIED SPECTROSCOPY, 2001, 55 (01) :77-83

[6] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[7] CROSS-VALIDATORY CHOICE OF THE NUMBER OF COMPONENTS FROM A PRINCIPAL COMPONENT ANALYSIS [J].

EASTMENT, HT ;

KRZANOWSKI, WJ .

TECHNOMETRICS, 1982, 24 (01) :73-77

[8]

GEISSER S, 1974, BIOMETRIKA, V61, P101, DOI 10.1093/biomet/61.1.101

[9] Weighted least squares fitting using ordinary least squares algorithms [J].

Kiers, HAL .

PSYCHOMETRIKA, 1997, 62 (02) :251-266

[10] CROSS-VALIDATORY CHOICE IN PRINCIPAL COMPONENT ANALYSIS - SOME SAMPLING RESULTS [J].

KRZANOWSKI, WJ .

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1983, 18 (04) :299-314

← 1 2 3 →