Asynchronous distributed estimation of topic models for document analysis

被引:8
作者
Asuncion, Arthur U. [1 ]
Smyth, Padhraic [1 ]
Welling, Max [1 ]
机构
[1] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92717 USA
关键词
Topic model; Distributed learning; Parallelization; Gibbs sampling;
D O I
10.1016/j.stamet.2010.03.002
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Given the prevalence of large data sets and the availability of inexpensive parallel computing hardware, there is significant motivation to explore distributed implementations of statistical learning algorithms. In this paper, we present a distributed learning framework for Latent Dirichlet Allocation (LDA), a well-known Bayesian latent variable model for sparse matrices of count data. In the proposed approach, data are distributed across P processors, and processors independently perform inference on their local data and communicate their sufficient statistics in a local asynchronous manner with other processors. We apply two different approximate inference techniques for LDA, collapsed Gibbs sampling and collapsed variational inference, within a distributed framework. The results show significant improvements in computation time and memory when running the algorithms on very large text corpora using parallel hardware. Despite the approximate nature of the proposed approach, simulations suggest that asynchronous distributed algorithms are able to learn models that are nearly as accurate as those learned by the standard non-distributed approaches. We also find that our distributed algorithms converge rapidly to good solutions. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:3 / 17
页数:15
相关论文
共 50 条
[31]   A systematic review of the use of topic models for short text social media analysis [J].
Caitlin Doogan Poet Laureate ;
Wray Buntine ;
Henry Linger .
Artificial Intelligence Review, 2023, 56 :14223-14255
[32]   A systematic review of the use of topic models for short text social media analysis [J].
Laureate, Caitlin Doogan Poet ;
Buntine, Wray ;
Linger, Henry .
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (12) :14223-14255
[33]   A Comparative Study of Topic Models for Topic Clustering of Chinese Web News [J].
Wu, Yonghui ;
Ding, Yuxin ;
Wang, Xiaolong ;
Xu, Jun .
PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (ICCSIT 2010), VOL 5, 2010, :236-240
[34]   Using Topic Models in Domain Adaptation [J].
Zahabi, Samira Tofighi ;
Bakhshaei, Somayeh ;
Khadivi, Shahram .
2014 7th International Symposium on Telecommunications (IST), 2014, :539-543
[35]   Scaling up Dynamic Topic Models [J].
Bhadury, Arnab ;
Chen, Jianfei ;
Zhu, Jun ;
Liu, Shixia .
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16), 2016, :381-390
[36]   PageRank Topic Model: Estimation of Multinomial Distributions using Network Structure Analysis Methods [J].
Ikegami, Kenshin ;
Ohsawa, Yukio .
FUNDAMENTA INFORMATICAE, 2018, 159 (03) :257-277
[37]   Bayesian Bridging Topic Models for Classification [J].
Wu, Meng-Sung .
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2014, 30 (05) :1585-1600
[38]   A Survey of Topic Models in Text Classification [J].
Xia, Linzhong ;
Luo, Dean ;
Zhang, Chunxiao ;
Wu, Zhou .
2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2019), 2019, :244-250
[39]   Incorporating Probabilistic Knowledge into Topic Models [J].
Yao, Liang ;
Zhang, Yin ;
Wei, Baogang ;
Qian, Hongze ;
Wang, Yibing .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, 2015, 9078 :586-597
[40]   Learning Topic Models by Belief Propagation [J].
Zeng, Jia ;
Cheung, William K. ;
Liu, Jiming .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (05) :1121-1134