Bayesian Factorizations of Big Sparse Tensors

被引：38

作者：

Zhou, Jing ^{[1
]}

Bhattacharya, Anirban ^{[2
]}

Herring, Amy H. ^{[3
]}

Dunson, David B. ^{[4
]}

机构：

[1] Univ N Carolina, Sch Publ Hlth, Chapel Hill, NC 27599 USA

[2] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA

[3] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA

[4] Duke Univ, Dept Stat Sci, Durham, NC 27708 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2015年 / 110卷 / 512期

基金：

美国国家卫生研究院;

关键词：

Bayesian; Categorical data; Contingency table; Log-linear model; Low rank; PARAFAC; Sparsity; Tensor factorization; POSTERIOR DISTRIBUTIONS; LINEAR-MODELS; ASYMPTOTIC NORMALITY; VARIABLE-SELECTION; CONVERGENCE-RATES; REGRESSION; CONSISTENCY; CONTRACTION; SHRINKAGE; INFERENCE;

D O I：

10.1080/01621459.2014.983233

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

It has become routine to collect data that are structured as multiway arrays (tensors). There is an enormous literature on low rank and sparse matrix factorizations, but limited consideration of extensions to the tensor case in statistics. The most common low rank tensor factorization relies on parallel factor analysis (PARAFAC), which expresses a rank k tensor as a sum of rank one tensors. In contingency table applications in which the sample size is massively less than the number of cells in the table, the low rank assumption is not sufficient and PARAFAC has poor performance. We induce an additional layer of dimension reduction by allowing the effective rank to vary across dimensions of the table. Taking a Bayesian approach, we place priors on terms in the factorization and develop an efficient Gibbs sampler for posterior computation. Theory is provided showing posterior concentration rates in high-dimensional settings, and the methods are shown to have excellent performance in simulations and several real data applications.

引用

页码：1562 / 1576

页数：15

共 39 条

[1]

[Anonymous], 2010, ARXIV PREPRINT ARXIV

[2]

[Anonymous], 2002, CATEGORICAL DATA ANA

[3]

Arias-Castro E., 2012, ELECTRON J STAT, V8, P328

[4] Posterior consistency in linear models under shrinkage priors [J].

Armagan, A. ;

Dunson, D. B. ;

Lee, J. ;

Bajwa, W. U. ;

Strawn, N. .

BIOMETRIKA, 2013, 100 (04) :1011-1018

[5] GENERALIZED DOUBLE PARETO SHRINKAGE [J].

Armagan, Artin ;

Dunson, David B. ;

Lee, Jaeyong .

STATISTICA SINICA, 2013, 23 (01) :119-143

[6]

Belitser E, 2003, ANN STAT, V31, P536

[7] Sparse Bayesian infinite factor models [J].

Bhattacharya, A. ;

Dunson, D. B. .

BIOMETRIKA, 2011, 98 (02) :291-306

[8] BERNSTEIN-VON MISES THEOREMS FOR GAUSSIAN REGRESSION WITH INCREASING NUMBER OF REGRESSORS [J].

Bontemps, Dominique .

ANNALS OF STATISTICS, 2011, 39 (05) :2557-2584

[9] PARAFAC. Tutorial and applications [J].

Bro, R .

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1997, 38 (02) :149-171

[10] The horseshoe estimator for sparse signals [J].

Carvalho, Carlos M. ;

Polson, Nicholas G. ;

Scott, James G. .

BIOMETRIKA, 2010, 97 (02) :465-480

← 1 2 3 4 →