Sparse PCA for High-Dimensional Data With Outliers

被引:47
|
作者
Hubert, Mia [1 ]
Reynkens, Tom [1 ]
Schmitt, Eric [1 ]
Verdonck, Tim [1 ]
机构
[1] Katholieke Univ Leuven, Dept Math, Leuven, Belgium
关键词
Dimension reduction; Outlier detection; Robustness; PROJECTION-PURSUIT APPROACH; PRINCIPAL COMPONENTS; ROBUST PCA;
D O I
10.1080/00401706.2015.1093962
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A new sparse PCA algorithm is presented, which is robust against outliers. The approach is based on the ROBPCA algorithm that generates robust but nonsparse loadings. The construction of the new ROSPCA method is detailed, as well as a selection criterion for the sparsity parameter. An extensive simulation study and a real data example are performed, showing that it is capable of accurately finding the sparse structure of datasets, even when challenging outliers are present. In comparison with a projection pursuit-based algorithm, ROSPCA demonstrates superior robustness properties and comparable sparsity estimation capability, as well as significantly faster computation time.
引用
收藏
页码:424 / 434
页数:11
相关论文
共 50 条
  • [21] Sparse kernel methods for high-dimensional survival data
    Evers, Ludger
    Messow, Claudia-Martina
    BIOINFORMATICS, 2008, 24 (14) : 1632 - 1638
  • [22] Sparse meta-analysis with high-dimensional data
    He, Qianchuan
    Zhang, Hao Helen
    Avery, Christy L.
    Lin, D. Y.
    BIOSTATISTICS, 2016, 17 (02) : 205 - 220
  • [23] Efficient Sparse Representation for Learning With High-Dimensional Data
    Chen, Jie
    Yang, Shengxiang
    Wang, Zhu
    Mao, Hua
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4208 - 4222
  • [24] Ensemble of sparse classifiers for high-dimensional biological data
    Kim, Sunghan
    Scalzo, Fabien
    Telesca, Donatello
    Hu, Xiao
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 12 (02) : 167 - 183
  • [25] Subspace Clustering of Very Sparse High-Dimensional Data
    Peng, Hankui
    Pavlidis, Nicos
    Eckley, Idris
    Tsalamanis, Ioannis
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3780 - 3783
  • [26] Robust PCA for high-dimensional data based on characteristic transformation
    He, Lingyu
    Yang, Yanrong
    Zhang, Bo
    AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2023, 65 (02) : 127 - 151
  • [27] Single-Pass PCA of Large High-Dimensional Data
    Yu, Wenjian
    Gu, Yu
    Li, Jian
    Liu, Shenghua
    Li, Yaohang
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3350 - 3356
  • [28] High-Dimensional Cross-Plant Process Monitoring With Data Privacy: A Federated Hierarchical Sparse PCA Approach
    Wang, Kai
    Song, Zhenli
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (03) : 4385 - 4396
  • [29] Depthgram: Visualizing outliers in high-dimensional functional data with application to fMRI data exploration
    Aleman-Gomez, Yasser
    Arribas-Gil, Ana
    Desco, Manuel
    Elias, Antonio
    Romo, Juan
    STATISTICS IN MEDICINE, 2022, 41 (11) : 2005 - 2024
  • [30] Categorical Data Analysis for High-Dimensional Sparse Gene Expression Data
    Dousti Mousavi, Niloufar
    Aldirawi, Hani
    Yang, Jie
    BIOTECH, 2023, 12 (03):