Dynamic determination of the dimension of PCA calibration models using F-statistics

被引:18
|
作者
Vogt, F [1 ]
Mizaikoff, B [1 ]
机构
[1] Georgia Inst Technol, Sch Chem & Biochem, Atlanta, GA 30332 USA
关键词
principal component analysis/regression (PCA/PCR); dimension of calibration models; dynamic PCA model adjustment; F-statistics; optical spectra;
D O I
10.1002/cem.813
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Owing to experimental measurement errors, determination of the proper dimension of calibration models is difficult. Cross-validation is a common approach for this purpose; however, if data evaluation is based on PCA only without consideration of sample concentrations, this computationally expensive method cannot be applied. In this study a statistical method for determining the proper dimension of PCA calibration models is presented from the viewpoint of multivariate regression analysis considering only measured data. For this iterative algorithm, individual principal components are included stepwise in a reduced model, which is subsequently tested against the full model including all principal components. This algorithm can be individually applied for optimized data evaluation to every measured data vector such as an optical spectrum of chemical analyte. This comparison is performed by an F-test comparing estimates of residual variance of a measurement spectrum determined from the reduced and the full model. This approach determines a lack of fit due to insufficient principal components. If no lack of fit is evident for a certain reduced model, it is considered that a sufficiently large model has been found and inclusion of additional principal components is stopped. Hence the resulting reduced calibration model includes only statistically significant principal components (PCs) and determines the minimum number of required PCs for a given measurement spectrum. The proposed algorithm is initially investigated using simulated data and subsequently applied to three different experimental sets of spectra. It is shown that for synthetic data at reasonable noise levels the correct number of principal components can be determined in most cases. The experimental examples demonstrate that the number of principal components determined by the proposed algorithm is slightly larger than a user would select manually by subjective visual inspection. As one result, the algorithm is able to detect small but significant spectroscopic features of experimental data which would otherwise be neglected. Copyright (C) 2003 John Wiley Sons, Ltd.
引用
收藏
页码:346 / 357
页数:12
相关论文
共 1 条