How many principal components? stopping rules for determining the number of non-trivial axes revisited

被引:608
作者
Peres-Neto, PR [1 ]
Jackson, DA [1 ]
Somers, KM [1 ]
机构
[1] Univ Toronto, Dept Zool, Toronto, ON M5S 3G5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
principal component analysis; stopping rules; Monte Carlo simulations;
D O I
10.1016/j.csda.2004.06.015
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Principal component analysis is one of the most widely applied tools in order to summarize common patterns of variation among variables. Several studies have investigated the ability of individual methods, or compared the performance of a number of methods, in determining the number of components describing common variance of simulated data sets. We identify a number of shortcomings related to these studies and conduct an extensive simulation study where we compare a larger number of rules available and develop some new methods. In total we compare 20 stopping rules and propose a two-step approach that appears to be highly effective. First, a Bartlett's test is used to test the significance of the first principal component, indicating whether or not at least two variables share common variation in the entire data set. If significant, a number of different rules can be applied to estimate the number of non-trivial components to be retained. However, the relative merits of these methods depend on whether data contain strongly correlated or uncorrelated variables. We also estimate the number of non-trivial components for a number of field data sets so that we can evaluate the applicability of our conclusions based on simulated data. (c) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:974 / 997
页数:24
相关论文
共 45 条
[1]   An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model [J].
Anderson, MJ ;
Legendre, P .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1999, 62 (03) :271-303
[2]  
Anderson TW., 1984, INTRO MULTIVARIATE S
[3]  
[Anonymous], 1979, Morphometrics, the multivariate analysis of biological data
[4]   COMPARISON OF MULTIVARIATE NORMAL GENERATORS [J].
BARR, DR ;
SLEZAK, NL .
COMMUNICATIONS OF THE ACM, 1972, 15 (12) :1048-&
[5]   TESTS OF SIGNIFICANCE IN FACTOR ANALYSIS [J].
Bartlett, M. S. .
BRITISH JOURNAL OF PSYCHOLOGY-STATISTICAL SECTION, 1950, 3 :77-85
[6]  
BARTLETT MS, 1954, J ROY STAT SOC B, V16, P296
[7]  
Blondel J., 1984, Evolutionary Biology (New York), V18, P141
[8]   REMARKS ON PARALLEL ANALYSIS [J].
BUJA, A ;
EYUBOGLU, N .
MULTIVARIATE BEHAVIORAL RESEARCH, 1992, 27 (04) :509-540
[9]   DETERMINING NUMBER OF INTERPRETABLE FACTORS [J].
CRAWFORD, CB .
PSYCHOLOGICAL BULLETIN, 1975, 82 (02) :226-237
[10]   1977 RIETZ LECTURE - BOOTSTRAP METHODS - ANOTHER LOOK AT THE JACKKNIFE [J].
EFRON, B .
ANNALS OF STATISTICS, 1979, 7 (01) :1-26