Modeling Emerging, Evolving and Fading Topics using Dynamic Soft Orthogonal NMF with Sparse Representation

被引:22
作者
Chen, Yong [1 ,2 ]
Zhang, Hui [1 ,2 ]
Wu, Junjie [3 ]
Wang, Xingguang [1 ]
Liu, Rui [1 ]
Lin, Mengxiang
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
[2] Natl Sci & Technol Resources Sharing Serv Engn Re, Beijing 100191, Peoples R China
[3] Beihang Univ, Sch Econ & Management, Beijing 100191, Peoples R China
来源
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2015年
关键词
Dynamic Topic Model (DTM); Non-negative Matrix Factorization (NMF); Soft Orthogonality; Sparse Representation; Topic Detection and Tracking (TDT); NONNEGATIVE MATRIX;
D O I
10.1109/ICDM.2015.96
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dynamic topic models (DTM) are of great use to analyze the evolution of unobserved topics of a text collection over time. Recent years have witnessed the explosive growth of streaming text data emerging from online media, which creates an unprecedented need for DTMs for timely event analysis. While there have been some matrix factorization methods in the literature for dynamic topic modeling, further study is still in great need to model emerging, evolving and fading topics in a more natural and effective way. In light of this, we first propose a matrix factorization model called SONMFSR (Soft Orthogonal NMF with Sparse Representation), which makes full use of soft orthogonal and sparsity constraints for static topic modeling. Furthermore, by introducing the constraints of emerging, evolving and fading topics to SONMFSR, we easily obtain a novel DTM called SONMFSRd for dynamic event analysis. Extensive experiments on two public corpora demonstrate the superiority of SONMFSRd to some state-of-the-art DTMs in both topic detection and tracking. In particular, SONMFSRd shows great potential in real-world applications, where popular topics in Two Sessions 2015 are captured and traced dynamically for possible insights.
引用
收藏
页码:61 / 70
页数:10
相关论文
共 31 条
[1]  
[Anonymous], 2012, Proceedings of the fifth ACM International Conference on Web Search and Data Mining
[2]  
Blei D.M., 2006, INT C MACHINE LEARNI, DOI DOI 10.1145/1143844.1143859
[3]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[4]  
Cao B, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2689
[5]   Incremental PLSI for threshold-resilient online event analysis [J].
Chou, Tzu-Chuan ;
Chen, Meng Chang .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (03) :289-299
[6]  
CHUA FCT, 2013, DAT MIN ICDM 2013 IE, P91, DOI DOI 10.1109/ICDM.2013.25
[7]  
Cichocki A., 2009, NONNEGATIVE MATRIX T
[8]  
Cichoki A., 2009, Non-negative matrix and tensor factorizations
[9]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[10]  
2-9