Tensor Decompositions for Learning Latent Variable Models (A Survey for ALT)

被引:16
作者
Anandkumar, Anima [1 ]
Ge, Rong [2 ]
Hsu, Daniel [3 ]
Kakade, Sham M. [4 ]
Telgarsky, Matus [5 ]
机构
[1] Univ Calif Irvine, Irvine, CA USA
[2] Microsoft Res, New England, PA USA
[3] Columbia Univ, New York, NY USA
[4] Rutgers State Univ, New Brunswick, NJ 08903 USA
[5] Univ Michigan, Ann Arbor, MI 48109 USA
来源
ALGORITHMIC LEARNING THEORY, ALT 2015 | 2015年 / 9355卷
关键词
INDEPENDENT COMPONENT ANALYSIS; APPROXIMATION;
D O I
10.1007/978-3-319-24486-0_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This note is a short version of that in [1]. It is intended as a survey for the 2015 Algorithmic Learning Theory (ALT) conference. This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models-including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation-which exploits a certain tensor structure in their low-order observable moments (typically, of second-and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.
引用
收藏
页码:19 / 38
页数:20
相关论文
共 32 条
[1]  
Anandkumar A., 2014, J MACHINE LEARNING R, V15
[2]  
Anandkumar A., 2012, C LEARNING THEORY, P33
[3]  
[Anonymous], 2013, 4 INNOVATIONS THEORE
[4]  
[Anonymous], 2012, Springer Series in Statistics, DOI DOI 10.1007/978-1-4612-4946-7
[5]  
[Anonymous], 2012, Advances in Neural Information Processing Systems
[6]   On exchangeable random variables and the statistics of large graphs and hypergraphs [J].
Austin, Tim .
PROBABILITY SURVEYS, 2008, 5 :80-145
[7]  
CARDOSO JF, 1991, INT CONF ACOUST SPEE, P3109, DOI 10.1109/ICASSP.1991.150113
[8]  
Cardoso JF, 1996, ISCAS 96: 1996 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - CIRCUITS AND SYSTEMS CONNECTING THE WORLD, VOL 2, P93, DOI 10.1109/ISCAS.1996.540360
[9]   PARALLEL PROPORTIONAL PROFILES AND OTHER PRINCIPLES FOR DETERMINING THE CHOICE OF FACTORS BY ROTATION [J].
Cattell, Raymond B. .
PSYCHOMETRIKA, 1944, 9 (04) :267-283
[10]   Full reconstruction of Markov models on evolutionary trees: Identifiability and consistency [J].
Chang, JT .
MATHEMATICAL BIOSCIENCES, 1996, 137 (01) :51-73