Turkish Broadcast News Transcription and Retrieval

被引:54
作者
Arisoy, Ebru [1 ]
Can, Dogan [1 ]
Parlak, Siddika [1 ]
Sak, Hasim [2 ]
Saraclar, Murat [1 ]
机构
[1] Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey
[2] Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 05期
关键词
Discriminative training; language modeling (LM); morphologically rich languages; speech recognition; spoken term detection;
D O I
10.1109/TASL.2008.2012313
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper summarizes our recent efforts for building a Turkish Broadcast News transcription and retrieval system. The agglutinative nature of Turkish leads to a high number of out-of-vocabulary (OOV) words which in turn lower automatic speech recognition (ASR) accuracy. This situation compromises the performance of speech retrieval systems based on ASR output. Therefore using a word-based ASR is not adequate for transcribing speech in Turkish. To alleviate this problem, various sub-word-based recognition units are utilized. These units solve the OOV problem with moderate size vocabularies and perform even better than a 500 K word vocabulary as far as recognition accuracy is concerned. As a novel approach, the interaction between recognition units, words and sub-words, and discriminative training is explored. Sub-word models benefit from discriminative training more than word models do, especially in the discriminative language modeling framework. For speech retrieval, a spoken term detection system based on automata indexation is utilized. As with transcription, retrieval performance is measured under various schemes incorporating words and sub-words. Best results are obtained using a cascade of word and sub-word indexes together with term-specific thresholding.
引用
收藏
页码:874 / 883
页数:10
相关论文
共 57 条
[1]  
AKSUNGURLU T, 2008, P IEEE SIU DID TURK, P1
[2]  
ALLAUZEN C, 2004, P WORKSH INT APPR SP, P33
[3]  
[Anonymous], SPEECH RECOGNITION H
[4]  
[Anonymous], 2005, P 43 ANN M ASS COMP
[5]   Speech and sliding text aided sign retrieval from hearing impaired sign news videos [J].
Aran, Oya ;
Ari, Ismail ;
Akarun, Lale ;
Dikici, Erinc ;
Parlak, Siddika ;
Saraclar, Murat ;
Campr, Pavel ;
Hruz, Marek .
JOURNAL ON MULTIMODAL USER INTERFACES, 2008, 2 (02) :117-131
[6]  
ARISOY E, 2008, P INT BRISB AUSTR, P825
[7]  
ARISOY E, 2007, P INT EUR ANTW BELG, P2381
[8]   A unified language model for large vocabulary continuous speech recognition of Turkish [J].
Arisoy, Ebru ;
Dutagaci, Helin ;
Arslan, Levent M. .
SIGNAL PROCESSING, 2006, 86 (10) :2844-2862
[9]   Lattice Extension and Vocabulary Adaptation for Turkish LVCSR [J].
Arisoy, Ebru ;
Saraclar, Murat .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (01) :163-173
[10]  
Bahl L., 1986, INT C ACOUSTICS SPEE, P49