Low-Rank Approximation and Regression in Input Sparsity Time

被引:158
作者
Clarkson, Kenneth L. [1 ,2 ]
Woodruff, David P. [1 ,2 ]
机构
[1] IBM Res, Almaden, CA 95120 USA
[2] IBM Res Almaden, 650 Harry Rd, San Jose, CA 95120 USA
关键词
Algorithms; Theory; Matrices; randomized; approximation; MONTE-CARLO ALGORITHMS;
D O I
10.1145/3019134
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We design a new distribution over mx n matrices S so that, for any fixed n x d matrix A of rank r, with probability at least 9/10, parallel to SAx parallel to(2) = (1 +/- epsilon)parallel to Ax parallel to(2) simultaneously for all x is an element of R-d. Here, m is bounded by a polynomial in r epsilon(-1), and the parameter epsilon is an element of (0, 1]. Such a matrix S is called a subspace embedding. Furthermore, SA can be computed in O(nnz(A)) time, where nnz(A) is the number of nonzero entries of A. This improves over all previous subspace embeddings, for which computing SArequired at least Omega(ndlog d) time. We call these S sparse embedding matrices. Using our sparse embedding matrices, we obtain the fastest known algorithms for overconstrained leastsquares regression, low- rank approximation, approximating all leverage scores, and l(p) regression. More specifically, let b be an nx 1 vector, epsilon > 0 a small enough value, and integers k, p >= 1. Our results include the following. -Regression: The regression problem is to find d x 1 vector x' for which parallel to Ax' - b parallel to(p) <= (1 + epsilon) minx parallel to Ax-b parallel to(p). For the Euclidean case p = 2, we obtain an algorithm running in O(nnz(A)) + (O) over tilde (d(3)epsilon(-2)) time, and another in O(nnz(A) log(1/epsilon))+ (O) over tilde (d(3) log(1/epsilon)) time. (Here, (O) over tilde (f) = f . log(O(1))(f).) For p epsilon [1,infinity), more generally, we obtain an algorithm running in O(nnz(A) logn) + O(r epsilon(-1))(C) time, for a fixed C. -Low-rank approximation: We give an algorithm to obtain a rank-k matrix (A) over cap (k) such that parallel to A - (A) over cap (k)parallel to(F) <= (1 + epsilon)parallel to A-A(k)parallel to(F), where A(k) is the best rank-k approximation to A. (That is, A(k) is the output of principal components analysis, produced by a truncated singular value decomposition, useful for latent semantic indexing and many other statistical problems.) Our algorithm runs in O(nnz(A))+ (O) over tilde (nk(2)epsilon(-4)+k(3)epsilon(-5)) time. -Leverage scores: We give an algorithm to estimate the leverage scores of A, up to a constant factor, in O(nnz(A)logn) + (O) over tilde (r(3)) time.
引用
收藏
页数:45
相关论文
共 65 条
  • [1] On spectral learning of mixtures of distributions
    Achlioptas, D
    McSherry, R
    [J]. LEARNING THEORY, PROCEEDINGS, 2005, 3559 : 458 - 469
  • [2] Achlioptas D, 2001, ANN IEEE SYMP FOUND, P500
  • [3] Ailon N., 2006, STOC'06. Proceedings of the 38th Annual ACM Symposium on Theory of Computing, P557, DOI 10.1145/1132516.1132597
  • [4] [Anonymous], 1984, C MODERN ANAL PROBAB
  • [5] [Anonymous], 2001, P 33 ANN ACM S THEOR
  • [6] [Anonymous], 2007, J ACM
  • [7] [Anonymous], 2012, CORR
  • [8] Arora S, 2006, LECT NOTES COMPUT SC, V4110, P272
  • [9] Avron Haim, 2013, NIPS
  • [10] Avron Haim, 2013, SUBSPACE EMBED UNPUB