Tensor Decompositions for Learning Latent Variable Models (A Survey for ALT)

被引：16

作者：

Anandkumar, Anima ^{[1
]}

Ge, Rong ^{[2
]}

Hsu, Daniel ^{[3
]}

Kakade, Sham M. ^{[4
]}

Telgarsky, Matus ^{[5
]}

机构：

[1] Univ Calif Irvine, Irvine, CA USA

[2] Microsoft Res, New England, PA USA

[3] Columbia Univ, New York, NY USA

[4] Rutgers State Univ, New Brunswick, NJ 08903 USA

[5] Univ Michigan, Ann Arbor, MI 48109 USA

来源：

ALGORITHMIC LEARNING THEORY, ALT 2015 | 2015年 / 9355卷

关键词：

INDEPENDENT COMPONENT ANALYSIS; APPROXIMATION;

D O I：

10.1007/978-3-319-24486-0_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This note is a short version of that in [1]. It is intended as a survey for the 2015 Algorithmic Learning Theory (ALT) conference. This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models-including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation-which exploits a certain tensor structure in their low-order observable moments (typically, of second-and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.

引用

页码：19 / 38

页数：20

共 32 条

[1]

Anandkumar A., 2014, J MACHINE LEARNING R, V15

[2]

Anandkumar A., 2012, C LEARNING THEORY, P33

[3]

[Anonymous], 2013, 4 INNOVATIONS THEORE

[4]

[Anonymous], 2012, Springer Series in Statistics, DOI DOI 10.1007/978-1-4612-4946-7

[5]

[Anonymous], 2012, Advances in Neural Information Processing Systems

[6] On exchangeable random variables and the statistics of large graphs and hypergraphs [J].

Austin, Tim .

PROBABILITY SURVEYS, 2008, 5 :80-145

[7]

CARDOSO JF, 1991, INT CONF ACOUST SPEE, P3109, DOI 10.1109/ICASSP.1991.150113

[8]

Cardoso JF, 1996, ISCAS 96: 1996 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - CIRCUITS AND SYSTEMS CONNECTING THE WORLD, VOL 2, P93, DOI 10.1109/ISCAS.1996.540360

[9] PARALLEL PROPORTIONAL PROFILES AND OTHER PRINCIPLES FOR DETERMINING THE CHOICE OF FACTORS BY ROTATION [J].

Cattell, Raymond B. .

PSYCHOMETRIKA, 1944, 9 (04) :267-283

[10] Full reconstruction of Markov models on evolutionary trees: Identifiability and consistency [J].

Chang, JT .

MATHEMATICAL BIOSCIENCES, 1996, 137 (01) :51-73

← 1 2 3 4 →