Randomized Robust Subspace Recovery and Outlier Detection for High Dimensional Data Matrices

被引:32
|
作者
Rahmani, Mostafa [1 ]
Atia, George K. [1 ]
机构
[1] Univ Cent Florida, Dept Elect & Comp Engn, Orlando, FL 32816 USA
基金
美国国家科学基金会;
关键词
Big data; column/row sampling; low rank matrix; outlier detection; randomized algorithm; random embedding; robust; PCA; sketching; subspace learning; FACTORIZATION; INCOHERENCE; SPARSITY; JOHNSON; PROOF;
D O I
10.1109/TSP.2016.2645515
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper explores and analyzes two randomized designs for robust principal component analysis employing lowdimensional data sketching. In one design, a data sketch is constructed using random column sampling followed by lowdimensional embedding, while in the other, sketching is based on random column and rowsampling. Both designs are shown to bring about substantial savings in complexity andmemory requirements for robust subspace learning over conventional approaches that use the full scale data. A characterization of the sample and computational complexity of both designs is derived in the context of two distinct outliermodels, namely, sparse and independent outlier models. The proposed randomized approach can provably recover the correct subspace with computational and sample complexity which depend only weakly on the size of the data (only through the coherence parameters). The results of the mathematical analysis are confirmed through numerical simulations using both synthetic and real data.
引用
收藏
页码:1580 / 1594
页数:15
相关论文
共 50 条
  • [1] RANDOMIZED ROBUST SUBSPACE RECOVERY FOR BIG DATA
    Rahmani, Mostafa
    Atia, George K.
    2015 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2015,
  • [2] Analysis of Randomized Robust PCA for High Dimensional Data
    Rahmani, Mostafa
    Atia, George K.
    2015 IEEE SIGNAL PROCESSING AND SIGNAL PROCESSING EDUCATION WORKSHOP (SP/SPE), 2015, : 25 - 30
  • [3] A survey on unsupervised subspace outlier detection methods for high dimensional data
    Ahn, Jaehyeong
    Kwon, Sunghoon
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (03) : 507 - 521
  • [4] Outlier Detection in High Dimensional Data
    Kamalov, Firuz
    Leung, Ho Hon
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2020, 19 (01)
  • [5] Binary Gravitational Subspace Search for Outlier Detection in High Dimensional Data Streams
    Souiden, Imen
    Brahmi, Zaki
    Omri, Mohamed Nazih
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II, 2022, 13726 : 157 - 169
  • [6] A survey of outlier detection in high dimensional data streams
    Souiden, Imen
    Omri, Mohamed Nazih
    Brahmi, Zaki
    COMPUTER SCIENCE REVIEW, 2022, 44
  • [7] Subspace Outlier Detection in High Dimensional Data using Ensemble of PCA-based Subspaces
    Riahi-Madvar, Mahboobeh
    Nasersharif, Babak
    Azirani, Ahmad Akbari
    2021 26TH INTERNATIONAL COMPUTER CONFERENCE, COMPUTER SOCIETY OF IRAN (CSICC), 2021,
  • [8] Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality
    Shetta, Omar
    Niranjan, Mahesan
    ROYAL SOCIETY OPEN SCIENCE, 2020, 7 (02):
  • [9] A High-dimensional Outlier Detection Algorithm Base on Relevant Subspace
    Gao, Zhipeng
    Zhao, Yang
    Niu, Kun
    Fan, Yidan
    2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 1001 - 1008
  • [10] Robust local outlier detection with statistical parameter for big data
    Lei, Jingsheng
    Jiang, Teng
    Wu, Kui
    Du, Haizhou
    Zhu, Lin
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2015, 30 (05): : 411 - 419