Efficient Learning Algorithm for Maximum Entropy Discrimination Topic Models

被引:0
作者
Chen J. [1 ]
Zhu J. [1 ]
机构
[1] Department of Computer Science and Technology, Tsinghua University, Beijing
来源
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence | 2019年 / 32卷 / 08期
基金
中国国家自然科学基金;
关键词
Coordinate Descent; Gibbs Sampling; Rejection Sampling; Supervised Topic Models;
D O I
10.16451/j.cnki.issn1003-6059.201908007
中图分类号
学科分类号
摘要
Time complexity of the existing supervised topic model training algorithms is generally linear to the number of topics and therefore their large-scale application is limited. To solve this problem, an efficient learning algorithm for maximum entropy discrimination of latent Dirichlet allocation(MedLDA) supervised subject model is proposed in this paper. The proposed algorithm is based on coordinate descent, and the number of iterations of training classifiers is less than that of the existing Monte Carlo algorithm for MedLDA. The algorithm also makes use of rejection sampling and efficient preprocessing technique to reduce the time complexity of training from linear to sub-linear with respect to the number of topics. The comparison experiments on multiple text corpora show that the proposed algorithm makes a great improvement in training speed compared with the existing Monte Carlo algorithm. © 2019, Science Press. All right reserved.
引用
收藏
页码:736 / 745
页数:9
相关论文
共 29 条
  • [1] Blei D.M., Ng A.Y., Jordan M.I., Latent Dirichlet Allocation, Journal of Machine Learning Research, 3, pp. 993-1022, (2003)
  • [2] Blei D.M., Lafferty J.D., Correlated Topic Models[C/OL]. [2019-04-21]
  • [3] Blei D.M., Lafferty J.D., Dynamic Topic ModelsProc of the 23rd International Conference on Machine Learning. Berlin, (2006)
  • [4] Chen J.F., Zhu J., Lu J., Et al., Scalable Training of Hierarchical Topic Models, Proceedings of the VLDB Endowment, 11, 7, pp. 826-839, (2018)
  • [5] Blei D.M., Griffiths T.L., Jordan M.I., Et al., Hierarchical Topic Models and the Nested Chinese Restaurant Process[C/OL]. [2019-04-21]
  • [6] Miao Y.S., Yu L., Blunsom P., Neural Variational Inference for Text Processing, Proc of the 23rd International Conference on Machine Learning, pp. 1727-1736, (2016)
  • [7] Wei X., Croft W.B., LDA-Based Document Models for AD-HOC Retrieval, Proc of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178-185, (2006)
  • [8] Wang C., Blei D.M., Collaborative Topic Modeling for Recommending Scientific Articles, Proc of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 448-456, (2011)
  • [9] Liu S.X., Wang X.T., Liu J.F., Et al., TopicPanorama: A Full Picture of Relevant Topics, Proc of the IEEE Conference on Visual Analytics Science and Technology, pp. 183-192, (2014)
  • [10] Wang Y., Zhao X.M., Sun Z.L., Et al., Towards Topic Modeling for Big Data