Turkish Broadcast News Transcription and Retrieval

被引:54
作者
Arisoy, Ebru [1 ]
Can, Dogan [1 ]
Parlak, Siddika [1 ]
Sak, Hasim [2 ]
Saraclar, Murat [1 ]
机构
[1] Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey
[2] Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 05期
关键词
Discriminative training; language modeling (LM); morphologically rich languages; speech recognition; spoken term detection;
D O I
10.1109/TASL.2008.2012313
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper summarizes our recent efforts for building a Turkish Broadcast News transcription and retrieval system. The agglutinative nature of Turkish leads to a high number of out-of-vocabulary (OOV) words which in turn lower automatic speech recognition (ASR) accuracy. This situation compromises the performance of speech retrieval systems based on ASR output. Therefore using a word-based ASR is not adequate for transcribing speech in Turkish. To alleviate this problem, various sub-word-based recognition units are utilized. These units solve the OOV problem with moderate size vocabularies and perform even better than a 500 K word vocabulary as far as recognition accuracy is concerned. As a novel approach, the interaction between recognition units, words and sub-words, and discriminative training is explored. Sub-word models benefit from discriminative training more than word models do, especially in the discriminative language modeling framework. For speech retrieval, a spoken term detection system based on automata indexation is utilized. As with transcription, retrieval performance is measured under various schemes incorporating words and sub-words. Best results are obtained using a cascade of word and sub-word indexes together with term-specific thresholding.
引用
收藏
页码:874 / 883
页数:10
相关论文
共 57 条
[31]  
KURIMO M, 2007, P SIGIR AMST NETH, P631
[32]  
KURIMO M, 2005, P INT LISB PORT, P605
[33]   Korean large vocabulary continuous speech recognition with morpheme-based recognition units [J].
Kwon, OW ;
Park, J .
SPEECH COMMUNICATION, 2003, 39 (3-4) :287-300
[34]  
Lafferty J., 2001, PROC 18 INT C MACHIN, DOI [DOI 10.1038/NPROT.2006.61, 10.1038/nprot.2006.61]
[35]   Approaches to reduce the effects of OOV queries on indexed spoken audio [J].
Logan, B ;
Van Thong, JM ;
Moreno, PJ .
IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (05) :899-906
[36]  
LOGAN B, 2002, P HLT, P31
[37]  
MAMOU J, 2007, P SIGIR, P129
[38]  
MENGUSOGLU E, 2001, P ICASSP STUD FOR SA, P1563
[39]  
Miller D. R. H., 2007, INTERSPEECh, P314
[40]  
NIST, 2006, SPOK TERM DET STD 20