Efficient L1-Norm Principal-Component Analysis via Bit Flipping

被引:100
作者
Markopoulos, Panos P. [1 ]
Kundu, Sandipan [2 ]
Chamadia, Shubham [3 ]
Pados, Dimitris A. [3 ]
机构
[1] Rochester Inst Technol, Dept Elect & Microelect Engn, Rochester, NY 14623 USA
[2] Qualcomm Technol Inc, San Jose, CA 95110 USA
[3] Univ Buffalo State Univ New York, Dept Elect Engn, Buffalo, NY 14260 USA
基金
美国国家科学基金会;
关键词
Data analytics; dimensionality reduction; eigen-decomposition; faulty data; L1; norm; L2; machine learning; outlier resistance; principal component analysis; subspace signal processing; DISCRIMINANT-ANALYSIS; ALGORITHMS; FIT;
D O I
10.1109/TSP.2017.2708023
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
It was shown recently that the K L1-norm principal components (L1-PCs) of a real-valued data matrix X is an element of R-DxN (N data samples of D dimensions) can be exactly calculated with cost O(2(NK)) or, when advantageous, O(NdK-K+1) where d = rank(X), K < d. In applications where X is large (e.g., "big" data of large N and/or "heavy" data of large d), these costs are prohibitive. In this paper, we present a novel suboptimal algorithm for the calculation of the K < d L1-PCs of X of cost O(NDmin{N, D} + (NK2)-K-2 (K-2 + d)), which is comparable to that of standard L2-norm PC analysis. Our theoretical and experimental studies show that the proposed algorithm calculates the exact optimal L1-PCs with high frequency and achieves higher value in the L1-PC optimization metric than any known alternative algorithm of comparable computational cost. The superiority of the calculated L1-PCs over standard L2-PCs (singular vectors) in characterizing potentially faulty data/measurements is demonstrated with experiments in data dimensionality reduction and disease diagnosis from genomic data.
引用
收藏
页码:4252 / 4264
页数:13
相关论文
共 51 条
[1]  
[Anonymous], 2001, Pattern Classification
[2]  
Barnett V., 1994, Outliers in statistical data
[3]  
BARRODAL.I, 1968, ROY STAT SOC C-APP, V17, P51
[4]   IMPROVED ALGORITHM FOR DISCRETE L1 LINEAR-APPROXIMATION [J].
BARRODALE, I ;
ROBERTS, FDK .
SIAM JOURNAL ON NUMERICAL ANALYSIS, 1973, 10 (05) :839-848
[5]  
Bishop C.M., 2006, PATTERN RECOGN, V4, P738, DOI DOI 10.1117/1.2819119
[6]   A pure L1-norm principal component analysis [J].
Brooks, J. P. ;
Dula, J. H. ;
Boone, E. L. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2013, 61 :83-98
[7]   The L1-norm best-fit hyperplane problem [J].
Brooks, J. P. ;
Dula, J. H. .
APPLIED MATHEMATICS LETTERS, 2013, 26 (01) :51-55
[8]   Robust Principal Component Analysis? [J].
Candes, Emmanuel J. ;
Li, Xiaodong ;
Ma, Yi ;
Wright, John .
JOURNAL OF THE ACM, 2011, 58 (03)
[9]   A miRNA expression signature that separates between normal and malignant prostate tissues [J].
Carlsson, Jessica ;
Davidsson, Sabina ;
Helenius, Gisela ;
Karlsson, Mats ;
Lubovac, Zelmina ;
Andren, Ove ;
Olsson, Bjorn ;
Klinga-Levan, Karin .
CANCER CELL INTERNATIONAL, 2011, 11
[10]  
Ding C, 2006, P 23 INT C MACH LEAR, P281, DOI DOI 10.1145/1143844.1143880