Robust Principal Component Analysis: A Median of Means Approach

被引：8

作者：

Paul, Debolina ^{[1
]}

Chakraborty, Saptarshi ^{[2
]}

Das, Swagatam ^{[3
,4
]}

机构：

[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA

[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

[3] Indian Stat Inst, Elect & Commun Sci Unit, Kolkata 700108, India

[4] Inst Adv Intelligence IAI, TCG CREST, Kolkata 700091, India

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 11期

关键词：

Principal component analysis; Method of moments; Sparse matrices; Covariance matrices; Hilbert space; Estimation; Philosophical considerations; Median of Means (MoM); nonasymptotic error bounds; principal component analysis (PCA); robust dimensionality reduction; robust PCA; MATRIX; IMAGE; PCA;

D O I：

10.1109/TNNLS.2023.3298011

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Principal component analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in statistics, machine learning, computer vision, and related fields. However, PCA is well-known to fall prey to outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Following the Median of Means (MoM) philosophy, recent supervised learning methods have shown great success in dealing with outlying observations without much compromise to their large sample theoretical properties. This article proposes a PCA procedure based on the MoM principle. Called the MoMPCA, the proposed method is not only computationally appealing but also achieves optimal convergence rates under minimal assumptions. In particular, we explore the nonasymptotic error bounds of the obtained solution via the aid of the Rademacher complexities while granting absolutely no assumption on the outlying observations. The derived concentration results are not dependent on the dimension because the analysis is conducted in a separable Hilbert space, and the results only depend on the fourth moment of the underlying distribution in the corresponding norm. The proposal's efficacy is also thoroughly showcased through simulations and real data applications.

引用

页码：16788 / 16800

页数：13

共 65 条

[1]

Arora R, 2012, ANN ALLERTON CONF, P861, DOI 10.1109/Allerton.2012.6483308

[2]

Athreya K. B., 2006, MEASURE THEORY PROBA

[3]

Bartlett P. L., 2003, Journal of Machine Learning Research, V3, P463, DOI 10.1162/153244303321897690

[4] Model selection and error estimation [J].

Bartlett, PL ;

Boucheron, S ;

Lugosi, G .

MACHINE LEARNING, 2002, 48 (1-3) :85-113

[5] Robust and Sparse Principal Component Analysis With Adaptive Loss Minimization for Feature Selection [J].

Bian, Jintang ;

Zhao, Dandan ;

Nie, Feiping ;

Wang, Rong ;

Li, Xuelong .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) :3601-3614

[6]

Boucheron Stephane, 2005, ESAIM Probab. Statist., V9, P323, DOI DOI 10.1051/PS:2005018

[7] On the Applications of Robust PCA in Image and Video Processing [J].

Bouwmans, Thierry ;

Javed, Sajid ;

Zhang, Hongyang ;

Lin, Zhouchen ;

Otazo, Ricardo .

PROCEEDINGS OF THE IEEE, 2018, 106 (08) :1427-1457

[8]

Brunet-Saumard C, 2020, Arxiv, DOI arXiv:2002.03899

[9] Bandits With Heavy Tail [J].

Bubeck, Sebastien ;

Cesa-Bianchi, Nicolo ;

Lugosi, Gabor .

IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (11) :7711-7717

[10] Robust Principal Component Analysis? [J].

Candes, Emmanuel J. ;

Li, Xiaodong ;

Ma, Yi ;

Wright, John .

JOURNAL OF THE ACM, 2011, 58 (03)

← 1 2 3 4 5 6 7 →