Cauchy robust principal component analysis with applications to high-dimensional data sets

被引:0
|
作者
Aisha Fayomi
Yannis Pantazis
Michail Tsagris
Andrew T. A. Wood
机构
[1] King Abdulaziz University,Department of Statistics
[2] Foundation for Research and Technology - Hellas,Institute of Applied and Computational Mathematics
[3] University of Crete,Department of Economics
[4] Australian National University,Research School of Finance, Actuarial Studies & Statistics
来源
Statistics and Computing | 2024年 / 34卷
关键词
Principal component analysis; Robust; Cauchy log-likelihood; High-dimensional data;
D O I
暂无
中图分类号
学科分类号
摘要
Principal component analysis (PCA) is a standard dimensionality reduction technique used in various research and applied fields. From an algorithmic point of view, classical PCA can be formulated in terms of operations on a multivariate Gaussian likelihood. As a consequence of the implied Gaussian formulation, the principal components are not robust to outliers. In this paper, we propose a modified formulation, based on the use of a multivariate Cauchy likelihood instead of the Gaussian likelihood, which has the effect of robustifying the principal components. We present an algorithm to compute these robustified principal components. We additionally derive the relevant influence function of the first component and examine its theoretical properties. Simulation experiments on high-dimensional datasets demonstrate that the estimated principal components based on the Cauchy likelihood typically outperform, or are on a par with, existing robust PCA techniques. Moreover, the Cauchy PCA algorithm we have used has much lower computational cost in very high dimensional settings than the other public domain robust PCA methods we consider.
引用
收藏
相关论文
共 50 条
  • [41] When and Why are Principal Component Scores a Good Tool for Visualizing High-dimensional Data?
    Hellton, Kristoffer H.
    Thoresen, Magne
    SCANDINAVIAN JOURNAL OF STATISTICS, 2017, 44 (03) : 581 - 597
  • [42] Robust structured heterogeneity analysis approach for high-dimensional data
    Sun, Yifan
    Luo, Ziye
    Fan, Xinyan
    STATISTICS IN MEDICINE, 2022, 41 (17) : 3229 - 3259
  • [43] Probabilistic predictive principal component analysis for spatially misaligned and high-dimensional air pollution data with missing observations
    Vu, Phuong T.
    Larson, Timothy, V
    Szpiro, Adam A.
    ENVIRONMETRICS, 2020, 31 (04)
  • [44] Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification
    Deegalla, Sampath
    Bostrom, Henrik
    ICMLA 2006: 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2006, : 245 - +
  • [45] Principal Component Analysis (PCA) for high-dimensional data. PCA is dead. Long live PCA
    Yang, Fan
    Doksum, Kjell
    Tsui, Kam-Wah
    PERSPECTIVES ON BIG DATA ANALYSIS: METHODOLOGIES AND APPLICATIONS, 2014, 622 : 1 - 10
  • [46] Robust principal component analysis for functional data
    Peña, D
    Prieto, J
    TEST, 1999, 8 (01) : 56 - 60
  • [47] Robust principal component analysis for functional data
    N. Locantore
    J. S. Marron
    D. G. Simpson
    N. Tripoli
    J. T. Zhang
    K. L. Cohen
    Graciela Boente
    Ricardo Fraiman
    Babette Brumback
    Christophe Croux
    Jianqing Fan
    Alois Kneip
    John I. Marden
    Daniel Peña
    Javier Prieto
    Jim O. Ramsay
    Mariano J. Valderrama
    Ana M. Aguilera
    N. Locantore
    J. S. Marron
    D. G. Simpson
    N. Tripoli
    J. T. Zhang
    K. L. Cohen
    Test, 1999, 8 (1) : 1 - 73
  • [48] Optimal Sets of Projections of High-Dimensional Data
    Lehmann, Dirk J.
    Theisel, Holger
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2016, 22 (01) : 609 - 618
  • [49] Functional principal component model for high-dimensional brain imaging
    Zipunnikov, Vadim
    Caffo, Brian
    Yousem, David M.
    Davatzikos, Christos
    Schwartz, Brian S.
    Crainiceanu, Ciprian
    NEUROIMAGE, 2011, 58 (03) : 772 - 784
  • [50] CONVERGENCE AND PREDICTION OF PRINCIPAL COMPONENT SCORES IN HIGH-DIMENSIONAL SETTINGS
    Lee, Seunggeun
    Zou, Fei
    Wright, Fred A.
    ANNALS OF STATISTICS, 2010, 38 (06): : 3605 - 3629