Distributed Latent Dirichlet Allocation on Streams

被引:1
作者
Guo, Yunyan [1 ]
Li, Jianzhong [1 ,2 ]
机构
[1] Harbin Inst Technol, 92 Xidazhi St, Harbin 15001, Heilongjiang, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Distributed streams; learning system; variational inference; VARIATIONAL INFERENCE; OPTIMIZATION; BURSTY;
D O I
10.1145/3451528
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Dirichlet Allocation (LDA) has been widely used for topic modeling, with applications spanning various areas such as natural language processing and information retrieval. While LDA on small and static datasets has been extensively studied, several real-world challenges are posed in practical scenarios where datasets are often huge and are gathered in a streaming fashion. As the state-of-the-art LDA algorithm on streams, Streaming Variational Bayes (SVB) introduced Bayesian updating to provide a streaming procedure. However, the utility of SVB is limited in applications since it ignored three challenges of processing real-world streams: topic evolution, data turbulence, and real-time inference. In this article, we propose a novel distributed LDA algorithm-referred to as StreamFed-LDA-to deal with challenges on streams. For topic modeling of streaming data, the ability to capture evolving topics is essential for practical online inference. To achieve this goal, StreamFed-LDA is based on a specialized framework that supports lifelong (continual) learning of evolving topics. On the other hand, data turbulence is commonly present in streams due to real-life events. In that case, the design of StreamFed-LDA allows the model to learn new characteristics fromthe most recent data while maintaining the historical information. On massive streaming data, it is difficult and crucial to provide real-time inference results. To increase the throughput and reduce the latency, StreamFed-LDA introduces additional techniques that substantially reduce both computation and communication costs in distributed systems. Experiments on four real-world datasets show that the proposed framework achieves significantly better performance of online inference compared with the baselines. At the same time, StreamFed-LDA also reduces the latency by orders of magnitudes in real-world datasets.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Anarchists, Unite: Practical Entropy Approximation for Distributed Streams
    Gabel, Moshe
    Keren, Daniel
    Schuster, Assaf
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 837 - 846
  • [22] Optimal Random Sampling from Distributed Streams Revisited
    Tirthapura, Srikanta
    Woodruff, David P.
    DISTRIBUTED COMPUTING, 2011, 6950 : 283 - +
  • [23] Distributed Power Allocation for Coordinated Multipoint Transmissions in Distributed Antenna Systems
    Zhang, Xiujun
    Sun, Yin
    Chen, Xiang
    Zhou, Shidong
    Wang, Jing
    Shroff, Ness B.
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2013, 12 (05) : 2281 - 2291
  • [24] Optimal Allocation of Distributed Generation Considering Protection
    Bakr, Hamza M.
    Shaaban, Mostafa F.
    Osman, Ahmed H.
    Sindi, Hatem F.
    ENERGIES, 2020, 13 (09)
  • [25] Improved Convergence Rates for Distributed Resource Allocation
    Nedic, Angelia
    Olshevsky, Alex
    Shi, Wei
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 172 - 177
  • [26] Utilizing fuzzy optimization for distributed generation allocation
    Cano, Edwin B.
    TENCON 2007 - 2007 IEEE REGION 10 CONFERENCE, VOLS 1-3, 2007, : 171 - 174
  • [27] DISTRIBUTED LEARNING FOR RESOURCE ALLOCATION UNDER UNCERTAINTY
    Mertikopoulos, Panayotis
    Belmega, E. Veronica
    Sanguinetti, Luca
    2016 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2016, : 535 - 539
  • [28] A Distributed Data Allocation Algorithm for Biological Databases
    Tonini, Gustavo
    Siqueira, Frank
    2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 531 - 537
  • [29] Distributed Differential Evolution With Adaptive Resource Allocation
    Li, Jian-Yu
    Du, Ke-Jing
    Zhan, Zhi-Hui
    Wang, Hua
    Zhang, Jun
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (05) : 2791 - 2804
  • [30] A Novel Distributed Algorithm for Constrained Resource Allocation
    Wang, Xiaochu
    Sun, Changhao
    Sun, Ting
    2019 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC 2019), VOL 2, 2019, : 331 - 336