Low Rank Approximation with Entrywise l1-Norm Error

被引:35
作者
Song, Zhao [1 ]
Woodruff, David P. [2 ]
Zhong, Peilin [3 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[2] IBM Almaden Res Ctr, San Jose, CA 95120 USA
[3] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
来源
STOC'17: PROCEEDINGS OF THE 49TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING | 2017年
关键词
Entry-wise l(1) norm; low rank approximation; robust algorithms; sketching; numerical linear algebra; PRINCIPAL COMPONENT ANALYSIS; COMPUTATIONAL-COMPLEXITY; DECISION PROBLEM; 1ST-ORDER THEORY; PRELIMINARIES; ALGORITHMS; GEOMETRY; REALS;
D O I
10.1145/3055399.3055431
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We study the l(1)-low rank approximation problem, where for a given n x d matrix A and approximation factor alpha >= 1, the goal is to output a rank-k matrix (A) over cap for which parallel to A - (A) over cap parallel to(1) <= alpha. min (rank-k matrices A') parallel to A - A'parallel to(1), where for an n x d matrix C, we let parallel to C parallel to(1) = Sigma(n)(i=1) Sigma(d)(j=1) vertical bar C-i,C- j vertical bar . This error measure is known to be more robust than the Frobenius norm in the presence of outliers and is indicated in models where Gaussian assumptions on the noise may not apply. The problem was shown to be NP-hard by Gillis and Vavasis and a number of heuristics have been proposed. It was asked in multiple places if there are any approximation algorithms. We give the first provable approximation algorithms for l(1)-low rank approximation, showing that it is possible to achieve approximation factor alpha = (log d) . poly(k) in nnz(A) + (n + d) poly(k) time, where nnz(A) denotes the number of non-zero entries of A. If k is constant, we further improve the approximation ratio to O(1) with a poly(nd)-time algorithm. Under the Exponential Time Hypothesis, we show there is no poly(nd)-time algorithm achieving a (1 + 1/log(1+gamma) (nd))-approximation, for gamma > 0 an arbitrarily small constant, even when k = 1. We give a number of additional results for l(1)-low rank approximation: nearly tight upper and lower bounds for column subset selection, CUR decompositions, extensions to low rank approximation with respect to l(p)-norms for 1 <= p < 2 and earthmover distance, low-communication distributed protocols and low-memory streaming algorithms, algorithms with limited randomness, and bicriteria algorithms. We also give a preliminary empirical evaluation.
引用
收藏
页码:688 / 701
页数:14
相关论文
共 96 条
  • [11] Bai ZJ, 2005, LECT NOTES COMPUT SC, V3756, P471
  • [12] Communication Efficient Distributed Kernel Principal Component Analysis
    Balcan, Maria-Florina
    Liang, Yingyu
    Song, Le
    Woodruff, David
    Xie, Bo
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 725 - 734
  • [13] Basu Amitabh, 2016, CORR
  • [14] On the combinatorial and algebraic complexity of quantifier elimination
    Basu, S
    Pollack, R
    Roy, MF
    [J]. JOURNAL OF THE ACM, 1996, 43 (06) : 1002 - 1045
  • [15] BASU S, 2005, ALGORITHMS REAL ALGE
  • [16] Toward a Unified Theory of Sparse Dimensionality Reduction in Euclidean Space
    Bourgain, Jean
    Dirksen, Sjoerd
    Nelson, Jelani
    [J]. STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, : 499 - 508
  • [17] Optimal Principal Component Analysis in Distributed and Streaming Models
    Boutsidis, Christos
    Woodruff, David P.
    Zhong, Peilin
    [J]. STOC'16: PROCEEDINGS OF THE 48TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2016, : 236 - 249
  • [18] Optimal CUR Matrix Decompositions
    Boutsidis, Christos
    Woodruff, David P.
    [J]. STOC'14: PROCEEDINGS OF THE 46TH ANNUAL 2014 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2014, : 353 - 362
  • [19] Near-Optimal Column-Based Matrix Reconstruction
    Boutsidis, Christos
    Drineas, Petros
    Magdon-Ismail, Malik
    [J]. 2011 IEEE 52ND ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2011), 2011, : 305 - 314
  • [20] Boutsidis C, 2009, PROCEEDINGS OF THE TWENTIETH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P968