Randomized Robust Subspace Recovery and Outlier Detection for High Dimensional Data Matrices

被引:32
|
作者
Rahmani, Mostafa [1 ]
Atia, George K. [1 ]
机构
[1] Univ Cent Florida, Dept Elect & Comp Engn, Orlando, FL 32816 USA
基金
美国国家科学基金会;
关键词
Big data; column/row sampling; low rank matrix; outlier detection; randomized algorithm; random embedding; robust; PCA; sketching; subspace learning; FACTORIZATION; INCOHERENCE; SPARSITY; JOHNSON; PROOF;
D O I
10.1109/TSP.2016.2645515
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper explores and analyzes two randomized designs for robust principal component analysis employing lowdimensional data sketching. In one design, a data sketch is constructed using random column sampling followed by lowdimensional embedding, while in the other, sketching is based on random column and rowsampling. Both designs are shown to bring about substantial savings in complexity andmemory requirements for robust subspace learning over conventional approaches that use the full scale data. A characterization of the sample and computational complexity of both designs is derived in the context of two distinct outliermodels, namely, sparse and independent outlier models. The proposed randomized approach can provably recover the correct subspace with computational and sample complexity which depend only weakly on the size of the data (only through the coherence parameters). The results of the mathematical analysis are confirmed through numerical simulations using both synthetic and real data.
引用
收藏
页码:1580 / 1594
页数:15
相关论文
共 50 条
  • [21] RODD: Robust Outlier Detection in Data Cubes
    Kuhlmann, Lara
    Wilmes, Daniel
    Mueller, Emmanuel
    Pauly, Markus
    Horn, Daniel
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2023, 2023, 14148 : 325 - 339
  • [22] Autoencoder-based outlier detection for sparse, high dimensional data
    Chen, Wanghu
    Li, Huijun
    Li, Jing
    Arshad, Ali
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2735 - 2742
  • [24] ROBOUT: a conditional outlier detection methodology for high-dimensional data
    Farne, Matteo
    Vouldis, Angelos
    STATISTICAL PAPERS, 2024, 65 (04) : 2489 - 2525
  • [25] Outlier Detection for Robust Multi-Dimensional Scaling
    Blouvshtein, Leonid
    Cohen-Or, Daniel
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (09) : 2273 - 2279
  • [26] Fast outlier detection algorithm for high dimensional categorical data streams
    Zhou, Xiao-Yun
    Sun, Zhi-Hui
    Zhang, Bai-Li
    Yang, Yi-Dong
    Ruan Jian Xue Bao/Journal of Software, 2007, 18 (04): : 933 - 942
  • [27] Research on Outlier Detection for High-Dimensional Data Based on PPCLOF
    Chen, Chen
    Luo, Kaiwen
    Min, Lan
    Li, Shenglin
    JOURNAL OF WEB ENGINEERING, 2021, 20 (03): : 743 - 758
  • [28] PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data
    Mejia, Amanda F.
    Nebel, Mary Beth
    Eloyan, Ani
    Caffo, Brian
    Lindquist, Martin A.
    BIOSTATISTICS, 2017, 18 (03) : 521 - 536
  • [29] OUTLIER DETECTION WITH ENHANCED ANGLE-BASED OUTLIER FACTOR IN HIGH-DIMENSIONAL DATA STREAM
    Shou, Zhaoyu
    Tian, Hao
    Li, Simin
    Zou, Fengbo
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2018, 14 (05): : 1633 - 1651
  • [30] Robust Outlier Detection Method For Multivariate Spatial Data
    Sweta Shukla
    S. Lalitha
    National Academy Science Letters, 2021, 44 : 551 - 554