Sparse principal component analysis via regularized low rank matrix approximation

被引:452
作者
Shen, Haipeng [1 ]
Huang, Jianhua Z. [2 ]
机构
[1] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC 27599 USA
[2] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
基金
美国国家科学基金会;
关键词
dimension reduction; high-dimension-low-sample-size; regularization; singular value decomposition; thresholding;
D O I
10.1016/j.jmva.2007.06.007
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Principal component analysis (PCA) is a widely used tool for data analysis and dimension reduction in applications throughout science and engineering. However, the principal components (PCs) can sometimes be difficult to interpret, because they are linear combinations of all the original variables. To facilitate interpretation, sparse PCA produces modified PCs with sparse loadings, i.e. loadings with very few non-zero elements. In this paper, we propose a new sparse PCA method, namely sparse PCA via regularized SVD (sPCA-rSVD). We use the connection of PCA with singular value decomposition (SVD) of the data matrix and extract the PCs through solving a low rank matrix approximation problem. Regularization penalties are introduced to the corresponding minimization problem to promote sparsity in PC loadings. An efficient iterative algorithm is proposed for computation. Two tuning parameter selection methods are discussed. Some theoretical results are established to justify the use of sPCA-rSVD when only the data covariance matrix is available. In addition, we give a modified definition of variance explained by the sparse PCs. The sPCA-rSVD provides a uniform treatment of both classical multivariate data and high-dimension-low-sample-size (HDLSS) data. Further understanding of sPCA-rSVD and some existing alternatives is gained through simulation studies and real data examples, which suggests that sPCA-rSVD provides competitive results. (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:1015 / 1034
页数:20
相关论文
共 20 条
[1]   Adjustment of systematic microarray data biases [J].
Benito, M ;
Parker, J ;
Du, Q ;
Wu, JY ;
Xang, D ;
Perou, CM ;
Marron, JS .
BIOINFORMATICS, 2004, 20 (01) :105-114
[2]   LOADINGS AND CORRELATIONS IN THE INTERPRETATION OF PRINCIPAL COMPONENTS [J].
CADIMA, J ;
JOLLIFFE, IT .
JOURNAL OF APPLIED STATISTICS, 1995, 22 (02) :203-214
[3]   IDEAL SPATIAL ADAPTATION BY WAVELET SHRINKAGE [J].
DONOHO, DL ;
JOHNSTONE, IM .
BIOMETRIKA, 1994, 81 (03) :425-455
[4]   THE APPROXIMATION OF ONE MATRIX BY ANOTHER OF LOWER RANK [J].
Eckart, Carl ;
Young, Gale .
PSYCHOMETRIKA, 1936, 1 (03) :211-218
[5]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[6]   A STATISTICAL VIEW OF SOME CHEMOMETRICS REGRESSION TOOLS [J].
FRANK, IE ;
FRIEDMAN, JH .
TECHNOMETRICS, 1993, 35 (02) :109-135
[7]   LOWER RANK APPROXIMATION OF MATRICES BY LEAST-SQUARES WITH ANY CHOICE OF WEIGHTS [J].
GABRIEL, KR ;
ZAMIR, S .
TECHNOMETRICS, 1979, 21 (04) :489-498
[8]  
Hastie T., 2000, Genome Biology, V1, pr
[9]  
Jeffers J.N.R., 1967, APPLIED STATISTICS, V16, P225, DOI DOI 10.2307/2985919
[10]  
Jolliffe I.T., 2002, PRINCIPAL COMPONENTS