Neural Variational Correlated Topic Modeling

被引:30
作者
Liu, Luyang [1 ,2 ]
Huang, Heyan [1 ,3 ]
Gao, Yang [1 ]
Wei, Xiaochi [4 ]
Zhang, Yongfeng [5 ]
机构
[1] Beijing Inst Technol, Dept Comp Sci, Beijing, Peoples R China
[2] Beijing Engn Res Ctr High Volume Language Informa, Beijing, Peoples R China
[3] Zhejiang Lab, Beijing, Peoples R China
[4] Baidu Inc, Beijing, Peoples R China
[5] Rutgers State Univ, Dept Comp Sci, New Brunswick, NJ USA
来源
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019) | 2019年
基金
中国国家自然科学基金;
关键词
Natural language processing; topic model; neural variational inference;
D O I
10.1145/3308558.3313561
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the rapid development of the Internet, millions of documents, such as news and web pages, are generated everyday. Mining the topics and knowledge on them has attracted a lot of interest on both academic and industrial areas. As one of the prevalent unsupervised data mining tools, topic models are usually explored as probabilistic generative models for large collections of texts. Traditional probabilistic topic models tend to find a closed form solution of model parameters and approach the intractable posteriors via approximation methods, which usually lead to the inaccurate inference of parameters and low efficiency when it comes to a quite large volume of data. Recently, an emerging trend of neural variational inference can overcome the above issues, which offers a scalable and powerful deep generative framework for modeling latent topics via neural networks. Interestingly, a common assumption for the most neural variational topic models is that topics are independent and irrelevant to each other. However, this assumption is unreasonable in many practical scenarios. In this paper, we propose a novel Centralized Transformation Flow to capture the correlations among topics by reshaping topic distributions. Furthermore, we present the Transformation Flow Lower Bound to improve the performance of the proposed model. Extensive experiments on two standard benchmark datasets have well-validated the effectiveness of the proposed approach.
引用
收藏
页码:1142 / 1152
页数:11
相关论文
共 39 条
[1]   On-Line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking [J].
AlSumait, Loulwah ;
Barbara, Daniel ;
Domeniconi, Carlotta .
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, :3-12
[2]   An introduction to MCMC for machine learning [J].
Andrieu, C ;
de Freitas, N ;
Doucet, A ;
Jordan, MI .
MACHINE LEARNING, 2003, 50 (1-2) :5-43
[3]  
[Anonymous], 2005, CORRELATED TOPIC MOD
[4]  
[Anonymous], 2008, P 14 ACM SIGKDD INT, DOI DOI 10.1145/1401890.1401960
[5]  
[Anonymous], 2016, CORR
[6]  
[Anonymous], 2004, P 10 ACM SIGKDD INT, DOI [10.1145/1014052, DOI 10.1145/1014052, DOI 10.1145/1014052.1014087]
[7]  
[Anonymous], 2013, J MACHINE LEARNING R
[8]  
[Anonymous], CORR
[9]  
[Anonymous], MCSP4500794
[10]  
[Anonymous], CORR