Hierarchical latent semantic mapping for automated topic generation

被引:1
作者
Zhou G. [1 ]
Chen G. [1 ]
机构
[1] School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, No. 10 Xi Tu Cheng Road, Beijing
关键词
LDA; Network; Topic modeling; Unsupervised learning;
D O I
10.2991/ijndc.2016.4.2.6
中图分类号
学科分类号
摘要
Much of information sits in an unprecedented amount of text data. Managing allocation of these large scale text data is an important problem for many areas. Topic modeling performs well in this problem. The traditional generative models (PLSA,LDA) are the state-of-the-art approaches in topic modeling and most recent research on topic generation has been focusing on improving or extending these models. However, results of traditional generative models are sensitive to the number of topics K, which must be specified manually and determines the rank of solution space for topic generation. The problem of generating topics from corpus resembles community detection in networks. Many effective algorithms can automatically detect communities from networks without a manually specified number of the communities. Inspired by these algorithms, in this paper, we propose a novel method named Hierarchical Latent Semantic Mapping (HLSM), which automatically generates topics from corpus. HLSM calculates the association between each pair of words in the latent topic space, then constructs a unipartite network of words with this association and hierarchically generates topics from this network. We apply HLSM to several document collections and the experimental comparisons against several state-of-the-art approaches demonstrate the promising performance.
引用
收藏
页码:127 / 136
页数:9
相关论文
共 19 条
[1]  
Barathi B., Cross-domain text classification using semantic based approach, Sustainable Energy and Intelligent Systems (SEISCon 2011), International Conference on, pp. 820-825, (2011)
[2]  
Zhao R., Mao K., Supervised adaptive-transfer plsa for cross-domain text classification, Data Mining Workshop (ICDMW), 2014 IEEE International Conference on, pp. 259-266, (2014)
[3]  
Ghazifard A., Shamaee Z., Shams M., Topic word set-based text clustering, E-Commerce in Developing Countries: With Focus on E-Security (ECDC), 2013 7th Intenational Conference on, pp. 1-10, (2013)
[4]  
Chang H.-C., Hsu C.-C., Using topic keyword clusters for automatic document clustering, Information Technology and Applications, 2005. ICITA 2005. Third International Conference on, 1, pp. 419-424, (2005)
[5]  
Wilson J., Chaudhury S., Lall B., Improving collaborative filtering based recommenders using topic modelling, Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on, 1, pp. 340-346, (2014)
[6]  
Niebles J., Wang H., Fei-Fei L., Unsupervised learning of human action categories using spatial-temporal words, International Journal of Computer Vision, 79, 3, pp. 299-318, (2008)
[7]  
Hofmann T., Probabilistic latent semantic indexing, SIGIR, pp. 50-57, (1999)
[8]  
Blei A.Y., Ng D.M., Jordan M.I., Latent dirichlet allocation, Journal of Machine Learning Research, pp. 993-1022, (2003)
[9]  
Blei D.M., Lafferty J.D., Dynamic topic models, Proceedings of the 23rd International Conference on Machine Learning, Ser. ICML '06, pp. 113-120, (2006)
[10]  
Blei D.M., Griffiths T.L., Jordan M.I., The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies, J. ACM, 57, 2, pp. 71-730, (2010)