Principal component analysis

被引:597
作者
Greenacre, Michael [1 ,2 ]
Groenen, Patrick J. F. [3 ]
Hastie, Trevor [4 ,5 ]
D'Enza, Alfonso Lodice [6 ]
Markos, Angelos [7 ]
Tuzhilina, Elena [4 ]
机构
[1] Univ Pompeu Fabra, Dept Econ & Business, Barcelona, Spain
[2] Barcelona Sch Management, Barcelona, Spain
[3] Erasmus Univ, Erasmus Sch Econ, Econometr Inst, Rotterdam, Netherlands
[4] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[5] Stanford Univ, Dept Biomed Sci, Stanford, CA 94305 USA
[6] Univ Naples Federico II, Dept Polit Sci, Naples, Italy
[7] Democritus Univ Thrace, Dept Primary Educ, Alexandroupolis, Greece
来源
NATURE REVIEWS METHODS PRIMERS | 2022年 / 2卷 / 01期
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
CANONICAL CORRESPONDENCE-ANALYSIS; SYMBOLIC DATA-ANALYSIS; REDUNDANCY ANALYSIS; STOPPING RULES; MATRIX; NUMBER; RANK; REGULARIZATION; DECOMPOSITION; ALGORITHMS;
D O I
10.1038/s43586-022-00184-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Principal component analysis is a versatile statistical method for reducing a cases-by-variables data table to its essential features, called principal components. Principal components are a few linear combinations of the original variables that maximally explain the variance of all the variables. In the process, the method provides an approximation of the original data table using only these few major components. This Primer presents a comprehensive review of the method's definition and geometry, as well as the interpretation of its numerical and graphical results. The main graphical result is often in the form of a biplot, using the major components to map the cases and adding the original variables to support the distance interpretation of the cases' positions. Variants of the method are also treated, such as the analysis of grouped data and categorical data, known as correspondence analysis. Also described and illustrated are the latest innovative applications of principal component analysis: for estimating missing values in huge data matrices, sparse component estimation, and the analysis of images, shapes and functions. Supplementary material includes video animations and computer scripts in the R environment.
引用
收藏
页数:21
相关论文
共 149 条
[1]  
Abdi H., 2007, ENCYCL MEASURE STAT, V2, P651, DOI DOI 10.4135/9781412952644.NV-3
[2]   Principal component analysis [J].
Abdi, Herve ;
Williams, Lynne J. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459
[3]   Fast Principal Component Analysis of Large-Scale Genome-Wide Data [J].
Abraham, Gad ;
Inouye, Michael .
PLOS ONE, 2014, 9 (04)
[4]   Biplots of compositional data [J].
Aitchison, J ;
Greenacre, M .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2002, 51 :375-392
[5]   Song complexity is maintained during inter-population cultural transmission of humpback whale songs [J].
Allen, Jenny A. ;
Garland, Ellen C. ;
Garrigue, Claire ;
Dunlop, Rebecca A. ;
Noad, Michael J. .
SCIENTIFIC REPORTS, 2022, 12 (01)
[6]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[7]  
[Anonymous], 2000, ANAL SYMBOLIC DATA
[8]   Biplots of fuzzy coded data [J].
Asan, Zerrin ;
Greenacre, Michael .
FUZZY SETS AND SYSTEMS, 2011, 183 (01) :57-71
[9]   Choosing principal components: A new graphical method based on bayesian model selection [J].
Auer, Philipp ;
Gervini, Daniel .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2008, 37 (05) :962-977
[10]   Augmented GMRES-type methods [J].
Baglama, James ;
Reichel, Lothar .
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2007, 14 (04) :337-350