Randomized Robust Subspace Recovery and Outlier Detection for High Dimensional Data Matrices

被引:32
作者
Rahmani, Mostafa [1 ]
Atia, George K. [1 ]
机构
[1] Univ Cent Florida, Dept Elect & Comp Engn, Orlando, FL 32816 USA
基金
美国国家科学基金会;
关键词
Big data; column/row sampling; low rank matrix; outlier detection; randomized algorithm; random embedding; robust; PCA; sketching; subspace learning; FACTORIZATION; INCOHERENCE; SPARSITY; JOHNSON; PROOF;
D O I
10.1109/TSP.2016.2645515
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper explores and analyzes two randomized designs for robust principal component analysis employing lowdimensional data sketching. In one design, a data sketch is constructed using random column sampling followed by lowdimensional embedding, while in the other, sketching is based on random column and rowsampling. Both designs are shown to bring about substantial savings in complexity andmemory requirements for robust subspace learning over conventional approaches that use the full scale data. A characterization of the sample and computational complexity of both designs is derived in the context of two distinct outliermodels, namely, sparse and independent outlier models. The proposed randomized approach can provably recover the correct subspace with computational and sample complexity which depend only weakly on the size of the data (only through the coherence parameters). The results of the mathematical analysis are confirmed through numerical simulations using both synthetic and real data.
引用
收藏
页码:1580 / 1594
页数:15
相关论文
共 50 条
  • [41] Weighted Outlier Detection of High-Dimensional Categorical Data Using Feature Grouping
    Li, Junli
    Zhang, Jifu
    Pang, Ning
    Qin, Xiao
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (11): : 4295 - 4308
  • [42] Projected outlier detection in high-dimensional mixed-attributes data set
    Ye, Mao
    Li, Xue
    Orlowska, Maria E.
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 7104 - 7113
  • [43] Support high-order tensor data description for outlier detection in high-dimensional big sensor data
    Deng, Xiaowu
    Jiang, Peng
    Peng, Xiaoning
    Mi, Chunqiao
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 81 : 177 - 187
  • [44] Robust support vector data description for outlier detection with noise or uncertain data
    Chen, Guijun
    Zhang, Xueying
    Wang, Zizhong John
    Li, Fenglian
    KNOWLEDGE-BASED SYSTEMS, 2015, 90 : 129 - 137
  • [45] Outlier detection in high-dimensional regression model
    Wang, Tao
    Li, Zhonghua
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (14) : 6947 - 6958
  • [46] A NOVEL TENSOR ALGEBRAIC APPROACH FOR HIGH-DIMENSIONAL OUTLIER DETECTION UNDER DATA MISALIGNMENT
    Fan, Bo
    Zhang, Zemin
    Aeron, Shuchin
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3628 - 3632
  • [47] A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
    Koufakou, Anna
    Georgiopoulos, Michael
    DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 20 (02) : 259 - 289
  • [48] A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
    Anna Koufakou
    Michael Georgiopoulos
    Data Mining and Knowledge Discovery, 2010, 20 : 259 - 289
  • [49] An effective and efficient algorithm for high-dimensional outlier detection
    Charu C. Aggarwal
    Philip S. Yu
    The VLDB Journal, 2005, 14 : 211 - 221
  • [50] An effective and efficient algorithm for high-dimensional outlier detection
    Aggarwal, CC
    Yu, PS
    VLDB JOURNAL, 2005, 14 (02) : 211 - 221