Building Vietnamese Topic Modeling Based on Core Terms and Applying in Text Classification

被引:7
作者
Ha Nguyen Thi Thu [1 ]
Tinh Dao Thanh [2 ]
Thanh Nguyen Hai [3 ]
Vinh Ho Ngoc [4 ]
机构
[1] Elect Power Univ, Dept E Commerce, Hanoi, Vietnam
[2] Le Quy Don Tech Univ, Informat Technol Fac, Hanoi, Vietnam
[3] Vietnam Minist Educ & Training, Hanoi, Vietnam
[4] Vinh Univ Technol Educ, Nghean, Vietnam
来源
2015 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT2015) | 2015年
关键词
Vietnamese text; Text mining; Topic modeling; text classification; word processing; CATEGORIZATION;
D O I
10.1109/CSNT.2015.22
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
in the languages, the occur of words are indicated about meaning of contents in text. Generative models for text, such as the topic model, have the potential to make important contributions to the statistical analysis of large document collections, and the development of a deeper understanding of human language learning and processing. In this paper, we proposed a novel method for building Vietnamese topic model based on core terms and conditional probability. With this approach, we reduced cost of time for building corpus. After that, we perform with Vietnamese text classification and the experimental show that, this corpus will help text classification system really effectively than traditional methods, higher accuracy and reduced complex data processing.
引用
收藏
页码:1284 / 1288
页数:5
相关论文
共 29 条
  • [1] The Hidden Markov Topic Model: A Probabilistic Model of Semantic Representation
    Andrews, Mark
    Vigliocco, Gabriella
    [J]. TOPICS IN COGNITIVE SCIENCE, 2010, 2 (01) : 101 - 113
  • [2] [Anonymous], 2012, P 13 C EUR CHAPT ASS
  • [3] Bao YG, 2002, LECT NOTES COMPUT SC, V2534, P340
  • [4] Bi YX, 2004, LECT NOTES ARTIF INT, V3131, P127
  • [5] Blei D.M., 2006, P 23 INT C MACHINE L, P113, DOI [DOI 10.1145/1143844.1143859, 10.1145/1143844.114385]
  • [6] Probabilistic Topic Models
    Blei, David M.
    [J]. COMMUNICATIONS OF THE ACM, 2012, 55 (04) : 77 - 84
  • [7] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [8] Brank J., 2002, P 19 INT C MACH LEAR
  • [9] Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670
  • [10] Fragoudis D., 2002, SIGKDD 02