Eigenvectors from Eigenvalues Sparse Principal Component Analysis

被引:8
作者
Frost, H. Robert [1 ]
机构
[1] Dartmouth Coll, Dept Biomed Data Sci, Hanover, NH 03755 USA
基金
美国国家卫生研究院;
关键词
Eigenvector-eigenvalue identity; Principal component analysis; Sparse eigenvalue decomposition; Sparse principal component analysis;
D O I
10.1080/10618600.2021.1987254
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We present a novel technique for sparse principal component analysis. This method, named eigenvectors from eigenvalues sparse principal component analysis (EESPCA), is based on the formula for computing squared eigenvector loadings of a Hermitian matrix from the eigenvalues of the full matrix and associated sub-matrices. We explore two versions of the EESPCA method: a version that uses a fixed threshold for inducing sparsity and a version that selects the threshold via cross-validation. Relative to the state-of-the-art sparse PCA methods of Witten et al., Yuan and Zhang, a nd Tan et al., the fixed threshold EESPCA technique offers an order-of-magnitude improvement in computational speed, does not require estimation of tuning parameters via cross-validation, and can more accurately identify true zero principal component loadings across a range of data matrix sizes and covariance structures. Importantly, the EESPCA method achieves these benefits while maintaining out-of-sample reconstruction error and PC estimation error close to the lowest error generated by all evaluated approaches. EESPCA is a practical and effective technique for sparse PCA with particular relevance to computationally demanding statistical problems such as the analysis of high-dimensional datasets or application of statistical techniques like resampling that involve the repeated calculation of sparse PCs. Supplementary materials for this article are available online.
引用
收藏
页码:486 / 501
页数:16
相关论文
共 40 条
[1]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[2]   The Gene Ontology in 2010: extensions and refinements The Gene Ontology Consortium [J].
Berardini, Tanya Z. ;
Li, Donghui ;
Huala, Eva ;
Bridges, Susan ;
Burgess, Shane ;
McCarthy, Fiona ;
Carbon, Seth ;
Lewis, Suzanna E. ;
Mungall, Christopher J. ;
Abdulla, Amina ;
Wood, Valerie ;
Feltrin, Erika ;
Valle, Giorgio ;
Chisholm, Rex L. ;
Fey, Petra ;
Gaudet, Pascale ;
Kibbe, Warren ;
Basu, Siddhartha ;
Bushmanova, Yulia ;
Eilbeck, Karen ;
Siegele, Deborah A. ;
McIntosh, Brenley ;
Renfro, Daniel ;
Zweifel, Adrienne ;
Hu, James C. ;
Ashburner, Michael ;
Tweedie, Susan ;
Alam-Faruque, Yasmin ;
Apweiler, Rolf ;
Auchinchloss, Andrea ;
Bairoch, Amos ;
Barrell, Daniel ;
Binns, David ;
Blatter, Marie-Claude ;
Bougueleret, Lydie ;
Boutet, Emmanuel ;
Breuza, Lionel ;
Bridge, Alan ;
Browne, Paul ;
Chan, Wei Mun ;
Coudert, Elizabeth ;
Daugherty, Louise ;
Dimmer, Emily ;
Eberhardt, Ruth ;
Estreicher, Anne ;
Famiglietti, Livia ;
Ferro-Rojas, Serenella ;
Feuermann, Marc ;
Foulger, Rebecca ;
Gruaz-Gumowski, Nadine .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D331-D335
[3]  
Carlson M., 2020, GODB SET ANNOTATION
[4]   A direct formulation for sparse PCA using semidefinite programming [J].
d'Aspremont, Alexandre ;
El Ghaoui, Laurent ;
Jordan, Michael I. ;
Lanckriet, Gert R. G. .
SIAM REVIEW, 2007, 49 (03) :434-448
[5]  
DENTON PB, 2021, B AM MATH SOC, V1
[6]  
Fan J, 2016, NAT METHODS, V13, P241, DOI [10.1038/nmeth.3734, 10.1038/NMETH.3734]
[7]   SPARSE CCA: ADAPTIVE ESTIMATION AND COMPUTATIONAL BARRIERS [J].
Gao, Chao ;
Ma, Zongming ;
Zhou, Harrison H. .
ANNALS OF STATISTICS, 2017, 45 (05) :2074-2101
[8]   Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression [J].
Hafemeister, Christoph ;
Satija, Rahul .
GENOME BIOLOGY, 2019, 20 (01)
[9]  
Hastie T., 2009, The Elements of Statistical Learning: Data Mining, Inference and Prediction, V2nd, DOI [10.1007/978-0-387-84858-7, DOI 10.1007/978-0-387-84858-7]
[10]   Analysis of a complex of statistical variables into principal components [J].
Hotelling, H .
JOURNAL OF EDUCATIONAL PSYCHOLOGY, 1933, 24 :417-441