Exploring the use of latent topical information for statistical Chinese spoken document retrieval

被引:13
作者
Chen, B [1 ]
机构
[1] Natl Taiwan Normal Univ, Grad Inst Comp Sci & Informat Engn, Taipei 116, Taiwan
关键词
information retrieval; topical mixture model; probabilistic latent semantic analysis model; vector space model; latent semantic indexing model; HMM/N-gram retrieval model;
D O I
10.1016/j.patrec.2005.06.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper, we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the probabilistic latent semantic analysis model, vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT Chinese collections (TDT-2 and TDT-3). Noticeable improvements in retrieval performance were obtained. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:9 / 18
页数:10
相关论文
共 46 条
  • [1] [Anonymous], P 24 ANN INT ACM SIG, DOI DOI 10.1145/383952.384019
  • [2] [Anonymous], 2000, PROJ TOP DET TRACK
  • [3] Baeza-Yates R.A., 1999, Modern Information Retrieval
  • [4] Syllable-based Chinese text/spoken document retrieval using text/speech queries
    Bai, BR
    Chen, BL
    Wang, HM
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2000, 14 (05) : 603 - 616
  • [5] A CLUSTERING TECHNIQUE FOR SUMMARIZING MULTIVARIATE DATA
    BALL, GH
    HALL, DJ
    [J]. BEHAVIORAL SCIENCE, 1967, 12 (02): : 153 - &
  • [6] Statistical language model adaptation: review and perspectives
    Bellegarda, JR
    [J]. SPEECH COMMUNICATION, 2004, 42 (01) : 93 - 108
  • [7] Large vocabulary continuous speech recognition of Broadcast News - The Philips/RWTH approach
    Beyerlein, P
    Aubert, X
    Haeb-Umbach, R
    Harris, M
    Klakow, D
    Wendemuth, A
    Molau, S
    Ney, H
    Pitz, M
    Sixtus, A
    [J]. SPEECH COMMUNICATION, 2002, 37 (1-2) : 109 - 131
  • [8] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [9] Automatic recognition of spontaneous speech for access to multilingual oral history archives
    Byrne, W
    Doermann, D
    Franz, MT
    Gustman, S
    Hajic, J
    Oard, D
    Picheny, M
    Psutka, J
    Ramabhadran, B
    Soergel, D
    Ward, T
    Zhu, WJ
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 420 - 435
  • [10] A system for spoken query information retrievat on mobile devices
    Chang, E
    Seide, F
    Meng, HM
    Chen, ZR
    Shi, Y
    Li, YC
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (08): : 531 - 541