SPARSE TOPIC MODEL FOR TEXT CLASSIFICATION

被引:0
作者
Liu, Tao [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing 100872, Peoples R China
来源
PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4 | 2013年
关键词
Text classification; Topic model; Sparse coding;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses a new text classification method: Sparse Topic Model, which represents documents by the sparse coding of topics. Topics contain more semantic information than words, so it's more effective for feature representation of documents. Topics are extracted from documents by LDA in an unsupervised way. Based on these topics, sparse coding is applied to discover more high-level representation. We compare the Sparse Topic Model with the traditional methods, such as SVM, and the experimental result show that the proposed method achieves better performance, especially when the number of training examples is limited. The effect of topic number and word number per topic on the performance is also investigated. Due to the unsupervised characteristic of Sparse Topic Model, it's very useful for real application.
引用
收藏
页码:1916 / 1920
页数:5
相关论文
共 12 条
  • [1] Androutsopoulos I., 2000, SIGIR Forum, V34, P160
  • [2] [Anonymous], 1999, CLAIMING PLACE P 15
  • [3] [Anonymous], 2002, ICML
  • [4] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [5] DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
  • [6] 2-9
  • [7] K Dalal M., 2011, INT J COMPUTER APPL, V28, DOI [10.5120/3358-4633, DOI 10.5120/3358-4633]
  • [8] Lee H., 2007, Adv Neural Inform Process Syst, P801, DOI DOI 10.5555/2976456.2976557
  • [9] Mak H, 2003, IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, P602
  • [10] McCallum, 2002, MALLET MACHINE LEARN