Turkish Broadcast News Transcription and Retrieval

被引：54

作者：

Arisoy, Ebru ^{[1
]}

Can, Dogan ^{[1
]}

Parlak, Siddika ^{[1
]}

Sak, Hasim ^{[2
]}

Saraclar, Murat ^{[1
]}

机构：

[1] Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey

[2] Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 05期

关键词：

Discriminative training; language modeling (LM); morphologically rich languages; speech recognition; spoken term detection;

D O I：

10.1109/TASL.2008.2012313

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper summarizes our recent efforts for building a Turkish Broadcast News transcription and retrieval system. The agglutinative nature of Turkish leads to a high number of out-of-vocabulary (OOV) words which in turn lower automatic speech recognition (ASR) accuracy. This situation compromises the performance of speech retrieval systems based on ASR output. Therefore using a word-based ASR is not adequate for transcribing speech in Turkish. To alleviate this problem, various sub-word-based recognition units are utilized. These units solve the OOV problem with moderate size vocabularies and perform even better than a 500 K word vocabulary as far as recognition accuracy is concerned. As a novel approach, the interaction between recognition units, words and sub-words, and discriminative training is explored. Sub-word models benefit from discriminative training more than word models do, especially in the discriminative language modeling framework. For speech retrieval, a spoken term detection system based on automata indexation is utilized. As with transcription, retrieval performance is measured under various schemes incorporating words and sub-words. Best results are obtained using a cascade of word and sub-word indexes together with term-specific thresholding.

引用

页码：874 / 883

页数：10

共 57 条

[1]

AKSUNGURLU T, 2008, P IEEE SIU DID TURK, P1

[2]

ALLAUZEN C, 2004, P WORKSH INT APPR SP, P33

[3]

[Anonymous], SPEECH RECOGNITION H

[4]

[Anonymous], 2005, P 43 ANN M ASS COMP

[5] Speech and sliding text aided sign retrieval from hearing impaired sign news videos [J].

Aran, Oya ;

Ari, Ismail ;

Akarun, Lale ;

Dikici, Erinc ;

Parlak, Siddika ;

Saraclar, Murat ;

Campr, Pavel ;

Hruz, Marek .

JOURNAL ON MULTIMODAL USER INTERFACES, 2008, 2 (02) :117-131

[6]

ARISOY E, 2008, P INT BRISB AUSTR, P825

[7]

ARISOY E, 2007, P INT EUR ANTW BELG, P2381

[8] A unified language model for large vocabulary continuous speech recognition of Turkish [J].

Arisoy, Ebru ;

Dutagaci, Helin ;

Arslan, Levent M. .

SIGNAL PROCESSING, 2006, 86 (10) :2844-2862

[9] Lattice Extension and Vocabulary Adaptation for Turkish LVCSR [J].

Arisoy, Ebru ;

Saraclar, Murat .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (01) :163-173

[10]

Bahl L., 1986, INT C ACOUSTICS SPEE, P49

← 1 2 3 4 5 6 →