Clustering-Based Online News Topic Detection and Tracking Through Hierarchical Bayesian Nonparametric Models

被引:7
作者
Fan, Wentao [1 ]
Guo, Zhiyan [1 ]
Bouguila, Nizar [2 ]
Hou, Wenjuan [3 ]
机构
[1] Huaqiao Univ, Dept Comp Sci & Technol, Xiamen, Fujian, Peoples R China
[2] Concordia Univ, CIISE, Montreal, PQ, Canada
[3] Huaqiao Univ, Instrumental Anal Ctr, Xiamen, Fujian, Peoples R China
来源
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2021年
基金
中国国家自然科学基金;
关键词
Clustering; topic detection and tracking; hierarchical Bayesian model; VARIATIONAL INFERENCE; SELECTION;
D O I
10.1145/3404835.3462982
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a clustering-based online news topic detection and tracking (TDT) approach based on hierarchical Bayesian nonparametric framework that allows topics to be shared across different news stories in a corpus. Our approach is formulated using the hierarchical Pitman-Yor process mixture model with the inverted Beta-Liouville (IBL) distribution as its component density, which has shown superior performance in modeling text data than the widely used Gaussian distribution. Moreover, we theoretically develop a convergence-guaranteed online learning algorithm that can effectively learn the proposed TDT model from a stream of news stories based on varational Bayes. The merits of our TDT approach are illustrated by comparing it with other well-defined clustering-based TDT approaches on different news data sets.
引用
收藏
页码:2126 / 2130
页数:5
相关论文
共 31 条
[1]   A Bayesian analysis of spherical pattern based on finite Langevin mixture [J].
Amayri, Ola ;
Bouguila, Nizar .
APPLIED SOFT COMPUTING, 2016, 38 :373-383
[2]  
Amayri O, 2013, IEEE IJCNN
[3]  
[Anonymous], 2006, Pattern Recognition and Machine Learning
[4]  
[Anonymous], 2004, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), DOI DOI 10.1145/1014052.1016919
[5]  
[Anonymous], 1998, DARPA Broadcast News Transcription and Understanding Workshop
[6]  
Banerjee A, 2007, PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, P431
[7]   Variational Inference for Dirichlet Process Mixtures [J].
Blei, David M. ;
Jordan, Michael I. .
BAYESIAN ANALYSIS, 2006, 1 (01) :121-143
[8]   Variational Inference: A Review for Statisticians [J].
Blei, David M. ;
Kucukelbir, Alp ;
McAuliffe, Jon D. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) :859-877
[9]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[10]   Concept decompositions for large sparse text data using clustering [J].
Dhillon, IS ;
Modha, DS .
MACHINE LEARNING, 2001, 42 (1-2) :143-175