Data understanding with PCA: Structural and Variance Information plots

被引:74
作者
Camacho, Jose [1 ]
Pico, Jesus [2 ]
Ferrer, Alberto [3 ]
机构
[1] Univ Girona, Dept Engn Elect Elect & Automat, Girona 17071, Spain
[2] Univ Politecn Valencia, Dept Ingn Sistemas & Automat, Valencia 46022, Spain
[3] Univ Politecn Valencia, Dept Estadist & Invest Operat Aplicadas & Calidad, Valencia 46022, Spain
关键词
Principal Component Analysis; Data understanding; Variables relationships; Cross-validation; MISSING DATA; CROSS-VALIDATION; REGRESSION; FRAMEWORK; MODELS; COLOR; MSPC;
D O I
10.1016/j.chemolab.2009.10.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Principal Components Analysis (PCA) is a useful tool for discovering the relationships among the variables in a data set. Nonetheless, interpretation of a PCA model may be tricky, since loadings of high magnitude in a Principal Component (PC) do not necessarily imply correlation among the corresponding variables. To avoid misinterpretation of PCA, a new type of plots, named Structural and Variance Information (SVI) plots, is proposed. These plots are supported by a sound theoretical study of the variables relationships supplied by PCA, and provide the keys to understand these relationships. SVI plots are aimed at data understanding with PCA and are useful tools to determine the number of PCs in the model according to the pursued goal (e.g. data understanding, missing data recovery, data compression, multivariate statistical process control). Several simulated and real data set are used for illustration. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:48 / 56
页数:9
相关论文
共 19 条
  • [1] Alaa E., 2005, LECT NOTES COMPUTER, P935
  • [2] [Anonymous], J QUALITY TECHNOLOGY
  • [3] [Anonymous], 2003, User's Guide to Principal Components
  • [4] [Anonymous], 1984, OLSHEN STONE CLASSIF, DOI 10.2307/2530946
  • [5] Dealing with missing data in MSPC: several methods, different interpretations, some examples
    Arteaga, F
    Ferrer, A
    [J]. JOURNAL OF CHEMOMETRICS, 2002, 16 (8-10) : 408 - 418
  • [6] Framework for regression-based missing data imputation methods in on-line MSPC
    Arteaga, F
    Ferrer, A
    [J]. JOURNAL OF CHEMOMETRICS, 2005, 19 (08) : 439 - 447
  • [7] Cross-validation of component models: A critical look at current methods
    Bro, R.
    Kjeldahl, K.
    Smilde, A. K.
    Kiers, H. A. L.
    [J]. ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2008, 390 (05) : 1241 - 1251
  • [8] Ferrer Alberto, 2007, Quality Engineering, V19, P311, DOI 10.1080/08982110701621304
  • [9] H He, 2005, P SOC PHOTO-OPT INS, V6044, P60
  • [10] Improved process understanding using multiway principal component analysis
    Kosanovich, KA
    Dahl, KS
    Piovoso, MJ
    [J]. INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 1996, 35 (01) : 138 - 146