LIKELIHOOD ESTIMATION OF SPARSE TOPIC DISTRIBUTIONS IN TOPIC MODELS AND ITS APPLICATIONS TO WASSERSTEIN DOCUMENT DISTANCE CALCULATIONS

被引：1

作者：

Bing, Xin ^{[1
]}

Bunea, Florentina ^{[2
]}

Strimas-mackey, Seth ^{[2
]}

Wegkamp, Marten ^{[3
,4
]}

机构：

[1] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada

[2] Cornell Univ, Dept Stat & Data Sci, Ithaca, NY USA

[3] Cornell Univ, Dept Math, Ithaca, NY USA

[4] Cornell Univ, Dept Stat & Data Sci, Ithaca, NY USA

来源：

ANNALS OF STATISTICS | 2022年 / 50卷 / 06期

关键词：

Adaptive estimation; high-dimensional estimation; maximum likelihood estimation; minimax estimation; multinomial distribution; mixture model; sparse estimation; nonnegative matrix factorization; topic models; anchor words;

D O I：

10.1214/22-AOS2229

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

This paper studies the estimation of high-dimensional, discrete, possibly sparse, mixture models in the context of topic models. The data consists of observed multinomial counts of p words across n independent documents. In topic models, the p x n expected word frequency matrix is assumed to be factorized as a p x K word-topic matrix A and a K x n topic-document matrix T. Since columns of both matrices represent conditional probabilities belonging to probability simplices, columns of A are viewed as p-dimensional mixture components that are common to all documents while columns of T are viewed as the K-dimensional mixture weights that are document specific and are allowed to be sparse. The main interest is to provide sharp, finite sample, l(1)-norm convergence rates for estimators of the mixture weights T when A is either known or unknown. For known A, we suggest MLE estimation of T. Our nonstandard analysis of the MLE not only establishes its l(1) convergence rate, but also reveals a remarkable property: the MLE, with no extra regularization, can be exactly sparse and contain the true zero pattern of T. We further show that the MLE is both minimax optimal and adaptive to the unknown sparsity in a large class of sparse topic distributions. When A is unknown, we estimate T by optimizing the likelihood function corresponding to a plug in, generic, estimator (A) over cap of A. For any estimator (A) over cap that satisfies carefully detailed conditions for proximity to A, we show that the resulting estimator of T retains the properties established for the MLE. Our theoretical results allow the ambient dimensions K and p to grow with the sample sizes. Our main application is to the estimation of 1-Wasserstein distances between document generating distributions. We propose, estimate and analyze new 1-Wasserstein distances between alternative probabilistic document representations, at the word and topic level, respectively. We derive finite sample bounds on the estimated proposed 1-Wasserstein distances. For word level document-distances, we provide contrast with existing rates on the 1-Wasserstein distance between standard empirical frequency estimates. The effectiveness of the proposed 1-Wasserstein distances is illustrated by an anal-ysis of an IMDB movie reviews data set. Finally, our theoretical results are supported by extensive simulation studies.

引用

页码：3307 / 3333

页数：27

共 37 条

[1]

Agresti A., 2012, Categorical data analysis

[2]

[Anonymous], 2012, Advances in Neural Information Processing Systems

[3]

Arora S., 2013, International Conference on Machine Learning, V28, P280

[4]

Arora S, 2016, PR MACH LEARN RES, V48

[5] Learning Topic Models - Going beyond SVD [J].

Arora, Sanjeev ;

Ge, Rong ;

Moitra, Ankur .

2012 IEEE 53RD ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2012, :1-10

[6]

Bansal T, 2014, ADV NEUR IN, V27

[7] SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR [J].

Bickel, Peter J. ;

Ritov, Ya'acov ;

Tsybakov, Alexandre B. .

ANNALS OF STATISTICS, 2009, 37 (04) :1705-1732

[8]

BING X., 2022, LIKELIHOOD ESTIMATIO, DOI [10.1214/22-AOS2229SUPP, DOI 10.1214/22-AOS2229SUPP]

[9]

Bing X, 2020, J MACH LEARN RES, V21

[10] A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics [J].

Bing, Xin ;

Bunea, Florentina ;

Wegkamp, Marten .

BERNOULLI, 2020, 26 (03) :1765-1796

← 1 2 3 4 →