Distributed Latent Dirichlet Allocation on Streams

被引:1
|
作者
Guo, Yunyan [1 ]
Li, Jianzhong [1 ,2 ]
机构
[1] Harbin Inst Technol, 92 Xidazhi St, Harbin 15001, Heilongjiang, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Distributed streams; learning system; variational inference; VARIATIONAL INFERENCE; OPTIMIZATION; BURSTY;
D O I
10.1145/3451528
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Dirichlet Allocation (LDA) has been widely used for topic modeling, with applications spanning various areas such as natural language processing and information retrieval. While LDA on small and static datasets has been extensively studied, several real-world challenges are posed in practical scenarios where datasets are often huge and are gathered in a streaming fashion. As the state-of-the-art LDA algorithm on streams, Streaming Variational Bayes (SVB) introduced Bayesian updating to provide a streaming procedure. However, the utility of SVB is limited in applications since it ignored three challenges of processing real-world streams: topic evolution, data turbulence, and real-time inference. In this article, we propose a novel distributed LDA algorithm-referred to as StreamFed-LDA-to deal with challenges on streams. For topic modeling of streaming data, the ability to capture evolving topics is essential for practical online inference. To achieve this goal, StreamFed-LDA is based on a specialized framework that supports lifelong (continual) learning of evolving topics. On the other hand, data turbulence is commonly present in streams due to real-life events. In that case, the design of StreamFed-LDA allows the model to learn new characteristics fromthe most recent data while maintaining the historical information. On massive streaming data, it is difficult and crucial to provide real-time inference results. To increase the throughput and reduce the latency, StreamFed-LDA introduces additional techniques that substantially reduce both computation and communication costs in distributed systems. Experiments on four real-world datasets show that the proposed framework achieves significantly better performance of online inference compared with the baselines. At the same time, StreamFed-LDA also reduces the latency by orders of magnitudes in real-world datasets.
引用
收藏
页数:20
相关论文
共 50 条
  • [11] Applying TRIZ and Kansei engineering to the eco-innovative product design towards waste recycling with latent Dirichlet allocation topic model analysis
    Yang, Chaoxiang
    Xu, Tengfei
    Ye, Junnan
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [12] Inference and Learning in a Latent Variable Model for Beta Distributed Interval Data
    Mousavi, Hamid
    Buhl, Mareike
    Guiraud, Enrico
    Drefs, Jakob
    Lucke, Jorg
    ENTROPY, 2021, 23 (05)
  • [13] Monitoring persistent items in the union of distributed streams
    Singh, Sneha Aman
    Tirthapura, Srikanta
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (11) : 3115 - 3127
  • [14] A Novel Joint Rate Allocation Scheme of Multiple Streams
    Fan, Hongfei
    Ding, Lin
    Jia, Huizhu
    Xie, Xiaodong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (03) : 854 - 867
  • [15] Monitoring Least Squares Models of Distributed Streams
    Gabel, Moshe
    Keren, Daniel
    Schuster, Assaf
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 319 - 328
  • [16] On the emergence of oscillations in distributed resource allocation
    Holding, Thomas
    Lestas, Ioannis
    AUTOMATICA, 2017, 85 : 22 - 33
  • [17] Distributed rate allocation for inelastic flows
    Hande, Prashanth
    Zhang, Shengyu
    Chiang, Mung
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2007, 15 (06) : 1240 - 1253
  • [18] Migration Model for Distributed Server Allocation
    Yanase, Souhei
    He, Fujun
    Taka, Haruto
    Kawabata, Akio
    Oki, Eiji
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2023, E106B (01) : 44 - 56
  • [19] Distributed generation allocation considering uncertainties
    Saric, Mirza
    Hivziefendic, Jasna
    Konjic, Tatjana
    Ktena, Aphrodite
    INTERNATIONAL TRANSACTIONS ON ELECTRICAL ENERGY SYSTEMS, 2018, 28 (09):
  • [20] Anarchists, Unite: Practical Entropy Approximation for Distributed Streams
    Gabel, Moshe
    Keren, Daniel
    Schuster, Assaf
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 837 - 846