Clustering-Based Online News Topic Detection and Tracking Through Hierarchical Bayesian Nonparametric Models

被引:7
作者
Fan, Wentao [1 ]
Guo, Zhiyan [1 ]
Bouguila, Nizar [2 ]
Hou, Wenjuan [3 ]
机构
[1] Huaqiao Univ, Dept Comp Sci & Technol, Xiamen, Fujian, Peoples R China
[2] Concordia Univ, CIISE, Montreal, PQ, Canada
[3] Huaqiao Univ, Instrumental Anal Ctr, Xiamen, Fujian, Peoples R China
来源
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2021年
基金
中国国家自然科学基金;
关键词
Clustering; topic detection and tracking; hierarchical Bayesian model; VARIATIONAL INFERENCE; SELECTION;
D O I
10.1145/3404835.3462982
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a clustering-based online news topic detection and tracking (TDT) approach based on hierarchical Bayesian nonparametric framework that allows topics to be shared across different news stories in a corpus. Our approach is formulated using the hierarchical Pitman-Yor process mixture model with the inverted Beta-Liouville (IBL) distribution as its component density, which has shown superior performance in modeling text data than the widely used Gaussian distribution. Moreover, we theoretically develop a convergence-guaranteed online learning algorithm that can effectively learn the proposed TDT model from a stream of news stories based on varational Bayes. The merits of our TDT approach are illustrated by comparing it with other well-defined clustering-based TDT approaches on different news data sets.
引用
收藏
页码:2126 / 2130
页数:5
相关论文
共 31 条
  • [1] A Bayesian analysis of spherical pattern based on finite Langevin mixture
    Amayri, Ola
    Bouguila, Nizar
    [J]. APPLIED SOFT COMPUTING, 2016, 38 : 373 - 383
  • [2] Amayri O, 2013, IEEE IJCNN
  • [3] [Anonymous], 2006, Pattern Recognition and Machine Learning
  • [4] [Anonymous], 2004, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), DOI DOI 10.1145/1014052.1016919
  • [5] [Anonymous], 1998, DARPA Broadcast News Transcription and Understanding Workshop
  • [6] Banerjee A, 2007, PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, P431
  • [7] Variational Inference for Dirichlet Process Mixtures
    Blei, David M.
    Jordan, Michael I.
    [J]. BAYESIAN ANALYSIS, 2006, 1 (01): : 121 - 143
  • [8] Variational Inference: A Review for Statisticians
    Blei, David M.
    Kucukelbir, Alp
    McAuliffe, Jon D.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) : 859 - 877
  • [9] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [10] Concept decompositions for large sparse text data using clustering
    Dhillon, IS
    Modha, DS
    [J]. MACHINE LEARNING, 2001, 42 (1-2) : 143 - 175