Cauchy robust principal component analysis with applications to high-dimensional data sets

被引:0
|
作者
Aisha Fayomi
Yannis Pantazis
Michail Tsagris
Andrew T. A. Wood
机构
[1] King Abdulaziz University,Department of Statistics
[2] Foundation for Research and Technology - Hellas,Institute of Applied and Computational Mathematics
[3] University of Crete,Department of Economics
[4] Australian National University,Research School of Finance, Actuarial Studies & Statistics
来源
Statistics and Computing | 2024年 / 34卷
关键词
Principal component analysis; Robust; Cauchy log-likelihood; High-dimensional data;
D O I
暂无
中图分类号
学科分类号
摘要
Principal component analysis (PCA) is a standard dimensionality reduction technique used in various research and applied fields. From an algorithmic point of view, classical PCA can be formulated in terms of operations on a multivariate Gaussian likelihood. As a consequence of the implied Gaussian formulation, the principal components are not robust to outliers. In this paper, we propose a modified formulation, based on the use of a multivariate Cauchy likelihood instead of the Gaussian likelihood, which has the effect of robustifying the principal components. We present an algorithm to compute these robustified principal components. We additionally derive the relevant influence function of the first component and examine its theoretical properties. Simulation experiments on high-dimensional datasets demonstrate that the estimated principal components based on the Cauchy likelihood typically outperform, or are on a par with, existing robust PCA techniques. Moreover, the Cauchy PCA algorithm we have used has much lower computational cost in very high dimensional settings than the other public domain robust PCA methods we consider.
引用
收藏
相关论文
共 50 条
  • [31] High-dimensional Data Classification Based on Principal Component Analysis Dimension Reduction and Improved BP Algorithm
    Yan, Tai-shan
    Wen, Yi-ting
    Li, Wen-bin
    2018 INTERNATIONAL CONFERENCE ON COMMUNICATION, NETWORK AND ARTIFICIAL INTELLIGENCE (CNAI 2018), 2018, : 441 - 445
  • [32] Efficient high-dimensional indexing by sorting principal component
    Cui, Jiangtao
    Zhou, Shuisheng
    Sun, Junding
    PATTERN RECOGNITION LETTERS, 2007, 28 (16) : 2412 - 2418
  • [33] Sparse common component analysis for multiple high-dimensional datasets via noncentered principal component analysis
    Heewon Park
    Sadanori Konishi
    Statistical Papers, 2020, 61 : 2283 - 2311
  • [34] Sparse common component analysis for multiple high-dimensional datasets via noncentered principal component analysis
    Park, Heewon
    Konishi, Sadanori
    STATISTICAL PAPERS, 2020, 61 (06) : 2283 - 2311
  • [35] On Closeness Between Factor Analysis and Principal Component Analysis Under High-Dimensional Conditions
    Liang, L.
    Hayashi, K.
    Yuan, Ke-Hai
    Quantitative Psychology Research, 2015, 140 : 209 - 221
  • [36] Objective-sensitive principal component analysis for high-dimensional inverse problems
    Elizarev, Maksim
    Mukhin, Andrei
    Khlyupin, Aleksey
    COMPUTATIONAL GEOSCIENCES, 2021, 25 (06) : 2019 - 2031
  • [37] Objective-sensitive principal component analysis for high-dimensional inverse problems
    Maksim Elizarev
    Andrei Mukhin
    Aleksey Khlyupin
    Computational Geosciences, 2021, 25 : 2019 - 2031
  • [38] Asymptotic Distribution of Studentized Contribution Ratio in High-Dimensional Principal Component Analysis
    Hyodo, Masashi
    Yamada, Takayuki
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2009, 38 (04) : 905 - 917
  • [39] High-dimensional inference with the generalized Hopfield model: Principal component analysis and corrections
    Cocco, S.
    Monasson, R.
    Sessak, V.
    PHYSICAL REVIEW E, 2011, 83 (05):
  • [40] Robust PCA for high-dimensional data
    Hubert, M
    Rousseeuw, PJ
    Verboven, S
    DEVELOPMENTS IN ROBUST STATISTICS, 2003, : 169 - 179