Spoken Document Retrieval With Unsupervised Query Modeling Techniques

被引:23
作者
Chen, Berlin [1 ]
Chen, Kuan-Yu [2 ]
Chen, Pei-Ning [1 ]
Chen, Yi-Wen [1 ]
机构
[1] Natl Taiwan Normal Univ, Dept Comp Sci & Informat Engn, Taipei 116, Taiwan
[2] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 09期
关键词
Query modeling; relevance class; spoken document retrieval; topic modelling;
D O I
10.1109/TASL.2012.2208628
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Ever-increasing amounts of publicly available multimedia associated with speech information have motivated spoken document retrieval (SDR) to be an active area of intensive research in the speech processing community. Much work has been dedicated to developing elaborate indexing and modeling techniques for representing spoken documents, but only little to improving query formulations for better representing the information needs of users. The latter is critical to the success of a SDR system. In view of this, we present in this paper a novel use of a relevance language modeling framework for SDR. It not only inherits the merits of several existing techniques but also provides a principled way to render the lexical and topical relationships between a query and a spoken document. We further explore various ways to glean both relevance and non-relevance cues from the spoken document collection so as to enhance query modeling in an unsupervised fashion. In addition, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the various relevance and/or non-relevance cues. Empirical evaluations performed on the TDT (Topic Detection and Tracking) collections reveal that the methods derived from our modeling framework hold good promise for SDR and are very competitive with existing retrieval methods.
引用
收藏
页码:2602 / 2612
页数:11
相关论文
共 43 条
[1]  
[Anonymous], 2011, SPOKEN LANGUAGE UNDE
[2]  
[Anonymous], 2009, Text Mining: Theory and Applications, DOI DOI 10.1201/9781420059458.CH4
[3]  
[Anonymous], 2009, ACM T ASIAN LANGUAGE
[4]  
[Anonymous], 2000, PROJ TOP DET TRACK
[5]  
[Anonymous], 2011, Modern Information Retrieval: The Concepts and Technology behind Search
[6]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[7]  
Chelba C, 2008, IEEE SIGNAL PROC MAG, V25, P39, DOI 10.1109/MSP.200S.917992
[8]  
Chen B., 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), P389, DOI 10.1109/ASRU.2011.6163963
[9]  
CHEN B, 2004, ACM T ASIAN LANGUAGE, V3, P128
[10]  
Chen B., 2011, P IEEE INT C MULT EX