Dimensionality choice in principal components analysis via cross-validatory methods

被引:14
作者
Eshghi, Peyman [1 ]
机构
[1] GlaxoSmithkline, Dept Stat Sci, Ware SG12 0DP, Herts, England
关键词
NIPALS; PCA; Cross-validation; MISSING DATA; NUMBER; MATRIX; MODELS;
D O I
10.1016/j.chemolab.2013.09.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper considers cross-validation based approaches to automatically determine the appropriate number of dimensions to retain in a Principal Components Analysis (PCA). Three approaches based on a mixture of leaving groups of observations and variables out are described. They are compared through simulation across a range of datasets of differing sizes and differing levels of missingness using the NIPALS algorithm to carry out the PCA. Also included in the paper is an explicit description of how the NIPALS algorithm is implemented to deal with missing data. Finally we provide suggestions as to which approach offers a better compromise between reliability in choosing the optimal number of components, and the computational burden. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:6 / 13
页数:8
相关论文
共 19 条
  • [1] [Anonymous], 2002, Principal components analysis
  • [2] [Anonymous], 2003, User's Guide to Principal Components
  • [3] [Anonymous], 1980, Multivariate Analysis
  • [4] [Anonymous], 2008, The R Project for Statistical Computing 2.8.1
  • [5] Cross-validation of component models: A critical look at current methods
    Bro, R.
    Kjeldahl, K.
    Smilde, A. K.
    Kiers, H. A. L.
    [J]. ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2008, 390 (05) : 1241 - 1251
  • [6] Cattell R.B., 1967, MULTIVARIATE BEHAV R, V3, P1
  • [7] DENHAM MC, 1993, J ROY STAT SOC C, V42, P515
  • [8] CROSS-VALIDATORY CHOICE OF THE NUMBER OF COMPONENTS FROM A PRINCIPAL COMPONENT ANALYSIS
    EASTMENT, HT
    KRZANOWSKI, WJ
    [J]. TECHNOMETRICS, 1982, 24 (01) : 73 - 77
  • [9] Combining process and spectroscopic data to improve batch modeling
    Gabrielsson, Jon
    Jonsson, Hans
    Trygg, Johan
    Airiau, Christian
    Schmidt, Bernd
    Escott, Richard
    [J]. AICHE JOURNAL, 2006, 52 (09) : 3164 - 3172
  • [10] Analysis of a complex of statistical variables into principal components
    Hotelling, H
    [J]. JOURNAL OF EDUCATIONAL PSYCHOLOGY, 1933, 24 : 417 - 441