Text Categorization Based on Topic Model

被引:0
|
作者
School of Computer Science and Technology, China University of Mining and Technology, Jiangsu Province, Xuzhou [1 ]
221116, China
不详 [2 ]
100081, China
机构
[1] School of Computer Science and Technology, China University of Mining and Technology, Jiangsu Province, Xuzhou
[2] School of Computer Science and Technology, Beijing Institute of Technology, Haidian District, Beijing
来源
Int. J. Comput. Intell. Syst. | 2009年 / 4卷 / 398-409期
关键词
Category Language Model; Latent Dirichlet allocation; Topic model; Variational Inference;
D O I
10.2991/ijcis.2009.2.4.8
中图分类号
学科分类号
摘要
In the text literature, many topic models were proposed to represent documents and words as topics or latent topics in order to process text effectively and accurately. In this paper, we propose LDACLM or Latent Dirichlet Allocation Category Language Model for text categorization and estimate parameters of models by variational inference. As a variant of Latent Dirichlet Allocation Model, LDACLM regards documents of category as Language Model and uses variational parameters to estimate maximum a posteriori of terms. In general, experiments show LDACLM model is effective and outperform Naïve Bayes with Laplace smoothing and Rocchio algorithm but little inferior to SVM for text categorization. © 2009, the authors.
引用
收藏
页码:398 / 409
页数:11
相关论文
共 50 条
  • [21] Gaussian Process Based Text Categorization for Healthy Information
    Chen, Sih-Huei
    Lee, Yuan-Shan
    Tai, Tzu-Chiang
    Wang, Jia-Ching
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2015, : 30 - 33
  • [22] A Survey of Topic Models in Text Classification
    Xia, Linzhong
    Luo, Dean
    Zhang, Chunxiao
    Wu, Zhou
    2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2019), 2019, : 244 - 250
  • [23] LDA-based Keyword Selection in Text Categorization
    Tasci, Serafettin
    Gungor, Tunga
    2009 24TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2009, : 229 - 234
  • [24] An enhanced short text categorization model with deep abundant representation
    Gu, Yanhui
    Gu, Min
    Long, Yi
    Xu, Guandong
    Yang, Zhenglu
    Zhou, Junsheng
    Qu, Weiguang
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2018, 21 (06): : 1705 - 1719
  • [25] An enhanced short text categorization model with deep abundant representation
    Yanhui Gu
    Min Gu
    Yi Long
    Guandong Xu
    Zhenglu Yang
    Junsheng Zhou
    Weiguang Qu
    World Wide Web, 2018, 21 : 1705 - 1719
  • [26] Document Similarity Measure Based on Topic Model
    He, Ming
    Wang, Zhen-zhen
    Du, Yong-ping
    APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 1280 - 1284
  • [27] Clustering Based Topic Events Detection on Text Stream
    Li, Chunshan
    Ye, Yunming
    Zhang, Xiaofeng
    Chu, Dianhui
    Deng, Shengchun
    Xu, Xiaofei
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT 1, 2014, 8397 : 42 - 52
  • [28] A Distributed Topic Model for Large-Scale Streaming Text
    Li, Yicong
    Feng, Dawei
    Lu, Menglong
    Li, Dongsheng
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT II, 2019, 11776 : 37 - 48
  • [29] A short text sentiment-topic model for product reviews
    Xiong, Shufeng
    Wang, Kuiyi
    Ji, Donghong
    Wang, Bingkun
    NEUROCOMPUTING, 2018, 297 : 94 - 102
  • [30] SenU-PTM: a novel phrase-based topic model for short-text topic discovery by exploiting word embeddings
    Lu, Heng-Yang
    Zhang, Yi
    Du, Yuntao
    DATA TECHNOLOGIES AND APPLICATIONS, 2021, 55 (05) : 643 - 660