LDA-based online topic detection using tensor factorization

被引:45
作者
Guo, Xin [1 ,2 ]
Xiang, Yang [1 ,2 ]
Chen, Qian [3 ]
Huang, Zhenhua [1 ,2 ]
Hao, Yongtao [1 ,2 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Minist Educ, Shanghai 201804, Peoples R China
[3] Shanxi Univ, Key Lab Computat Intelligence & Chinese Informat, Minist Educ, Sch Comp & Informat Technol, Taiyuan, Peoples R China
关键词
LDA; tensor factorization; topic detection; topic tensor;
D O I
10.1177/0165551512473066
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the information retrieval field, effective and efficient extraction of topics from large-scale online text streams is challenging because it is a fully unsupervised learning task without prior knowledge. Most previous studies have focused on how to analyse text corpus to extract topics, rarely considering time dimensions. In the present study, we approached topic detection as a temporal optimization problem. Here, we propose a novel approach to incremental topic detection, called online topic detection using tensor factorization (OTD-TF), which is based on latent Dirichlet allocation (LDA). First, topics are obtained from the corpus in current time slices using LDA. Second, a topic tensor with a time dimension is constructed to identify the correlations between pairs of topics. Then, approximate topics are merged using TF. Finally, documents are reallocated to corresponding topic bins. By executing these steps continuously and incrementally, temporal topic detection can be achieved. In theoretical analyses and simulation experiments, OTD-TF outperformed other systems in terms of space and time complexity and achieved a high precision ratio. Our experimental evaluations also revealed interesting temporal patterns in topic emergence, development, extinction, burst and transience.
引用
收藏
页码:459 / 469
页数:11
相关论文
共 19 条
[1]  
Ahmed A, TIMELINES RECOVERING
[2]   On-Line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking [J].
AlSumait, Loulwah ;
Barbara, Daniel ;
Domeniconi, Carlotta .
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, :3-12
[3]  
[Anonymous], 2003, P 26 ANN INT ACM SIG
[4]  
[Anonymous], P 26 C UNC ART INT
[5]  
ANTHES G, 2010, COMMUN ACM, V53, P16
[6]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[7]  
Cichocki A., 2009, NONNEGATIVE MATRIX T
[8]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[9]  
2-9
[10]   Finding scientific topics [J].
Griffiths, TL ;
Steyvers, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 :5228-5235