Neural Variational Correlated Topic Modeling

被引：30

作者：

Liu, Luyang ^{[1
,2
]}

Huang, Heyan ^{[1
,3
]}

Gao, Yang ^{[1
]}

Wei, Xiaochi ^{[4
]}

Zhang, Yongfeng ^{[5
]}

机构：

[1] Beijing Inst Technol, Dept Comp Sci, Beijing, Peoples R China

[2] Beijing Engn Res Ctr High Volume Language Informa, Beijing, Peoples R China

[3] Zhejiang Lab, Beijing, Peoples R China

[4] Baidu Inc, Beijing, Peoples R China

[5] Rutgers State Univ, Dept Comp Sci, New Brunswick, NJ USA

来源：

WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019) | 2019年

基金：

中国国家自然科学基金;

关键词：

Natural language processing; topic model; neural variational inference;

D O I：

10.1145/3308558.3313561

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

With the rapid development of the Internet, millions of documents, such as news and web pages, are generated everyday. Mining the topics and knowledge on them has attracted a lot of interest on both academic and industrial areas. As one of the prevalent unsupervised data mining tools, topic models are usually explored as probabilistic generative models for large collections of texts. Traditional probabilistic topic models tend to find a closed form solution of model parameters and approach the intractable posteriors via approximation methods, which usually lead to the inaccurate inference of parameters and low efficiency when it comes to a quite large volume of data. Recently, an emerging trend of neural variational inference can overcome the above issues, which offers a scalable and powerful deep generative framework for modeling latent topics via neural networks. Interestingly, a common assumption for the most neural variational topic models is that topics are independent and irrelevant to each other. However, this assumption is unreasonable in many practical scenarios. In this paper, we propose a novel Centralized Transformation Flow to capture the correlations among topics by reshaping topic distributions. Furthermore, we present the Transformation Flow Lower Bound to improve the performance of the proposed model. Extensive experiments on two standard benchmark datasets have well-validated the effectiveness of the proposed approach.

引用

页码：1142 / 1152

页数：11

共 39 条

[1] On-Line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking [J].

AlSumait, Loulwah ;

Barbara, Daniel ;

Domeniconi, Carlotta .

ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, :3-12

[2] An introduction to MCMC for machine learning [J].

Andrieu, C ;

de Freitas, N ;

Doucet, A ;

Jordan, MI .

MACHINE LEARNING, 2003, 50 (1-2) :5-43

[3]

[Anonymous], 2005, CORRELATED TOPIC MOD

[4]

[Anonymous], 2008, P 14 ACM SIGKDD INT, DOI DOI 10.1145/1401890.1401960

[5]

[Anonymous], 2016, CORR

[6]

[Anonymous], 2004, P 10 ACM SIGKDD INT, DOI [10.1145/1014052, DOI 10.1145/1014052, DOI 10.1145/1014052.1014087]

[7]

[Anonymous], 2013, J MACHINE LEARNING R

[8]

[Anonymous], CORR

[9]

[Anonymous], MCSP4500794

[10]

[Anonymous], CORR

← 1 2 3 4 →