Open-vocabulary spoken term detection using graphone-based hybrid recognition systems

被引:29
作者
Akbacak, Murat [1 ]
Vergyri, Dimitra [1 ]
Stolcke, Andreas [1 ]
机构
[1] SRI Int, Speech Technol & Res Lab, Menlo Pk, CA 94025 USA
来源
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 | 2008年
关键词
spoken term detection; audio indexing; voice search; open vocabulary;
D O I
10.1109/ICASSP.2008.4518841
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We address the problem of retrieving out-of-vocabulary (OOV) words/queries from audio archives for spoken term detection (STD) task. Many STD systems use the output of an automatic speech recognition (ASR) system which has a limited and fixed vocabulary, and are not capable of detecting rare words of high information content, such as named entities. Since such words are often of great interest for a retrieval task it is important to index spoken archives in a way that allows a user to search an OOV query/term.(1) In this work, we employ hybrid recognition systems which contain both words and subword units (graphones) to generate hybrid lattice indexes. We use a word-based STD system as our baseline, and present improvements by employing our proposed hybrid STD system that uses words plus graphones on the English broadcast news genre of the 2006 NIST STD task.
引用
收藏
页码:5240 / 5243
页数:4
相关论文
共 16 条
[1]  
ALLAUZEN C, 2004, P HLT NAACL C
[2]  
[Anonymous], 2005, P INT
[3]  
Bisani M., 2005, P INT 2005 OP VOC SP, P725, DOI [10.21437/Interspeech.2005-11, DOI 10.21437/INTERSPEECH.2005-11]
[4]  
Brown M. G., 1996, Proceedings ACM Multimedia 96, P307, DOI 10.1145/244130.244232
[5]  
Garofolo J. S., 1998, Sixth Text REtrieval Conference (TREC-6) (NIST SP 500-240), P83
[6]  
JONES GJF, 1996, P SIGIR 96 ZUR, P3038
[7]  
Markoulaki S., 2000, ARCHDELT, V55, P1029
[8]  
MILLER D, 2007, P INT C BELG
[9]   Subword-based approaches for spoken document retrieval [J].
Ng, K ;
Zue, VW .
SPEECH COMMUNICATION, 2000, 32 (03) :157-186
[10]  
NIST, 2006, SPOK TERM DET STD 20