Turkish Broadcast News Transcription and Retrieval

被引：54

作者：

Arisoy, Ebru ^{[1
]}

Can, Dogan ^{[1
]}

Parlak, Siddika ^{[1
]}

Sak, Hasim ^{[2
]}

Saraclar, Murat ^{[1
]}

机构：

[1] Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey

[2] Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 05期

关键词：

Discriminative training; language modeling (LM); morphologically rich languages; speech recognition; spoken term detection;

D O I：

10.1109/TASL.2008.2012313

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper summarizes our recent efforts for building a Turkish Broadcast News transcription and retrieval system. The agglutinative nature of Turkish leads to a high number of out-of-vocabulary (OOV) words which in turn lower automatic speech recognition (ASR) accuracy. This situation compromises the performance of speech retrieval systems based on ASR output. Therefore using a word-based ASR is not adequate for transcribing speech in Turkish. To alleviate this problem, various sub-word-based recognition units are utilized. These units solve the OOV problem with moderate size vocabularies and perform even better than a 500 K word vocabulary as far as recognition accuracy is concerned. As a novel approach, the interaction between recognition units, words and sub-words, and discriminative training is explored. Sub-word models benefit from discriminative training more than word models do, especially in the discriminative language modeling framework. For speech retrieval, a spoken term detection system based on automata indexation is utilized. As with transcription, retrieval performance is measured under various schemes incorporating words and sub-words. Best results are obtained using a cascade of word and sub-word indexes together with term-specific thresholding.

引用

页码：874 / 883

页数：10

共 57 条

[31]

KURIMO M, 2007, P SIGIR AMST NETH, P631

[32]

KURIMO M, 2005, P INT LISB PORT, P605

[33] Korean large vocabulary continuous speech recognition with morpheme-based recognition units [J].

Kwon, OW ;

Park, J .

SPEECH COMMUNICATION, 2003, 39 (3-4) :287-300

[34]

Lafferty J., 2001, PROC 18 INT C MACHIN, DOI [DOI 10.1038/NPROT.2006.61, 10.1038/nprot.2006.61]

[35] Approaches to reduce the effects of OOV queries on indexed spoken audio [J].

Logan, B ;

Van Thong, JM ;

Moreno, PJ .

IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (05) :899-906

[36]

LOGAN B, 2002, P HLT, P31

[37]

MAMOU J, 2007, P SIGIR, P129

[38]

MENGUSOGLU E, 2001, P ICASSP STUD FOR SA, P1563

[39]

Miller D. R. H., 2007, INTERSPEECh, P314

[40]

NIST, 2006, SPOK TERM DET STD 20

← 1 2 3 4 5 6 →