Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method

被引:78
作者
Barak, Boaz [1 ]
Kelner, Jonathan A. [2 ]
Steurer, David [3 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
[2] MIT, Dept Math, Cambridge, MA 02139 USA
[3] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
来源
STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING | 2015年
基金
美国国家科学基金会;
关键词
sparse coding; dictionary learning; sum-of-squares method; semidefinite programming; machine learning; unsupervised learning; statistical recovery; approximation algorithms; tensor optimization; polynomial optimization; SPARSE;
D O I
10.1145/2746539.2746605
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We give a new approach to the dictionary learning (also known as "sparse coding") problem of recovering an unknown n x m matrix A (for m >= n) from examples of the form y = Ax e, where x is a random vector in R-m with at most Tm nonzero coordinates, and e is a random noise vector in R-n with bounded magnitude. For the case m = 0(n), our algorithm recovers every column of A within arbitrarily good constant accuracy in time mO(log m/log(tau(-1))), in particular achieving polynomial time if T = M,-6 for any (5 > 0, and time m (1'g m) if T is (a sufficiently small) constant. Prior algorithms with comparable assumptions on the distribution required the vector x to be much sparser at most \Fn nonzero coordinates and there were intrinsic barriers preventing these algorithms from applying for denser x. We achieve this by designing an algorithm for noisy tensor decomposition that can recover, under quite general conditions, an approximate rank -one decomposition of a tensor T, given access to a tensor T' that is T -close to T in the spectral norm (when considered as a matrix). To our knowledge, this is the first algorithm for tensor decomposition that works in the constant spectral -norm noise regime, where there is no guarantee that the local optima of T and T' have similar structures. Our algorithm is based on a novel approach to using and analyzing the Sum of Squares semidefinite programming hierarchy (Parrilo 2000, Lasserre 2001), and it can be viewed
引用
收藏
页码:143 / 151
页数:9
相关论文
共 39 条
[1]  
Agarwal Alekh, 2013, ABS13107991 CORR
[2]  
Agarwal Alekh, 2013, ABS13091952 CORR
[3]  
Anandkumar A., 2012, ADV NEURAL INFORM PR, P926
[4]  
Anandkumar Anima, 2013, 13086723 ARXIV
[5]  
[Anonymous], 2012, JMLR WORKSHOP C P
[6]  
[Anonymous], 2007, Multi-Task Feature Learning, DOI DOI 10.7551/MITPRESS/7503.003.0010
[7]  
[Anonymous], 2000, Appl. Optim., DOI DOI 10.1007/978-1-4757-3216-0_17
[8]  
[Anonymous], 2007, P 20 INT C NEURAL IN
[9]  
[Anonymous], Foundations of the PARAFAC procedure: models and conditions for an explanatory multi-modal factor analysis. UCLA Working Papers in Phonetics. 1970
[10]  
16(10):1-84