Incorporating topic transition in topic detection and tracking algorithms

被引:10
作者
Zeng, Jianping [1 ]
Zhang, Shiyong [1 ]
机构
[1] Fudan Univ, Dept Comp & Informat Technol, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Topic transition; Topic detection and tracking; Hidden Markov model;
D O I
10.1016/j.eswa.2007.09.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Topics often transit among documents in a document collection. To improve the accuracy of the topic detection and tracking (TDT) algorithms in discovering topics or classifying documents, it is necessary to make full use of this kind of topic transition information. However, TDT algorithms usually. find topics based on topic models, such as LDA, pLSI, etc., which are a kind of mixture model and make the topic transition difficult to be denoted and implemented. A topic transition model representation based on hidden Markov model is present, and learning the topic transition from documents is discussed. Based on the model, two TDT algorithms incorporating topic transition, i.e. topic discovering and document classifying, are provided to show the application of the proposed model. Experiments on two real-world document collections are done with the two algorithms, and performance comparison with other similar algorithm shows that the accuracy can achieve 93% for topic discovering in Reuters-21578, and 97.3% in document classifying. Furthermore, topic transition discovered by the algorithm on a dataset which was collected from a BBS website is consistent with the manual analysis results. (C) 2007 Elsevier Ltd. All rights reserved.
引用
收藏
页码:227 / 232
页数:6
相关论文
共 16 条
  • [1] [Anonymous], P 22 ANN INT SIGIR C
  • [2] A sequential pruning strategy for the selection of the number of states in hidden Markov models
    Bicego, M
    Murino, V
    Figueiredo, MAT
    [J]. PATTERN RECOGNITION LETTERS, 2003, 24 (9-10) : 1395 - 1407
  • [3] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [4] Finding scientific topics
    Griffiths, TL
    Steyvers, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 : 5228 - 5235
  • [5] Emerging topic tracking system in WWW
    Khyou Bun, Khoo
    Ishizuka, Mitsuru
    [J]. KNOWLEDGE-BASED SYSTEMS, 2006, 19 (03) : 164 - 171
  • [6] Simple semantics in topic detection and tracking
    Makkonen, J
    Ahonen-Myka, H
    Salmenkivi, M
    [J]. INFORMATION RETRIEVAL, 2004, 7 (3-4): : 347 - 368
  • [7] McCallum A, 2005, 19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), P786
  • [8] Mei Qiaozhu., 2006, KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, P649
  • [9] Morinaga S., 2004, PROC 10 ACM SIGKDD I, P811, DOI DOI 10.1145/1014052.1016919
  • [10] A TUTORIAL ON HIDDEN MARKOV-MODELS AND SELECTED APPLICATIONS IN SPEECH RECOGNITION
    RABINER, LR
    [J]. PROCEEDINGS OF THE IEEE, 1989, 77 (02) : 257 - 286