Modeling Emerging, Evolving and Fading Topics using Dynamic Soft Orthogonal NMF with Sparse Representation

被引:22
作者
Chen, Yong [1 ,2 ]
Zhang, Hui [1 ,2 ]
Wu, Junjie [3 ]
Wang, Xingguang [1 ]
Liu, Rui [1 ]
Lin, Mengxiang
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
[2] Natl Sci & Technol Resources Sharing Serv Engn Re, Beijing 100191, Peoples R China
[3] Beihang Univ, Sch Econ & Management, Beijing 100191, Peoples R China
来源
2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2015年
关键词
Dynamic Topic Model (DTM); Non-negative Matrix Factorization (NMF); Soft Orthogonality; Sparse Representation; Topic Detection and Tracking (TDT); NONNEGATIVE MATRIX;
D O I
10.1109/ICDM.2015.96
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dynamic topic models (DTM) are of great use to analyze the evolution of unobserved topics of a text collection over time. Recent years have witnessed the explosive growth of streaming text data emerging from online media, which creates an unprecedented need for DTMs for timely event analysis. While there have been some matrix factorization methods in the literature for dynamic topic modeling, further study is still in great need to model emerging, evolving and fading topics in a more natural and effective way. In light of this, we first propose a matrix factorization model called SONMFSR (Soft Orthogonal NMF with Sparse Representation), which makes full use of soft orthogonal and sparsity constraints for static topic modeling. Furthermore, by introducing the constraints of emerging, evolving and fading topics to SONMFSR, we easily obtain a novel DTM called SONMFSRd for dynamic event analysis. Extensive experiments on two public corpora demonstrate the superiority of SONMFSRd to some state-of-the-art DTMs in both topic detection and tracking. In particular, SONMFSRd shows great potential in real-world applications, where popular topics in Two Sessions 2015 are captured and traced dynamically for possible insights.
引用
收藏
页码:61 / 70
页数:10
相关论文
共 31 条
[11]  
Derntl M., 2014, P 23 ACM INT C C INF, P2012
[12]  
Ding C.H., 2006, PROC 12 ACM SIGKDD I, P126
[13]   ON THE RELATIONSHIPS BETWEEN SVD, KLT AND PCA [J].
GERBRANDS, JJ .
PATTERN RECOGNITION, 1981, 14 (1-6) :375-381
[14]   Probabilistic latent semantic indexing [J].
Hofmann, T .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :50-57
[15]  
Iwata Tomoharu, 2010, Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P663
[16]  
Jiho Yoo, 2010, Journal of Computing Science and Engineering, V4, P97
[17]  
Khoat Than, 2012, Machine Learning and Knowledge Discovery in Databases. Proceedings of the European Conference (ECML PKDD 2012), P490, DOI 10.1007/978-3-642-33460-3_37
[18]   Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework [J].
Kim, Jingu ;
He, Yunlong ;
Park, Haesun .
JOURNAL OF GLOBAL OPTIMIZATION, 2014, 58 (02) :285-319
[19]  
Kim Seungyeon, 2015, LOCAL CONTEXT SPARSE, P2260
[20]   Learning the parts of objects by non-negative matrix factorization [J].
Lee, DD ;
Seung, HS .
NATURE, 1999, 401 (6755) :788-791