Lattice Indexing for Spoken Term Detection

被引：116

作者：

Can, Dogan ^{[1
]}

Saraclar, Murat ^{[2
]}

机构：

[1] Univ So Calif, Los Angeles, CA 90089 USA

[2] Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 08期

关键词：

Factor automata; lattice indexing; speech retrieval (SR); spoken term detection (STD); weighted finite-state transducers; FINITE-STATE TRANSDUCERS; LANGUAGE;

D O I：

10.1109/TASL.2011.2134087

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper considers the problem of constructing an efficient inverted index for the spoken term detection (STD) task. More specifically, we construct a deterministic weighted finite-state transducer storing soft-hits in the form of (utterance ID, start time, end time, posterior score) quadruplets. We propose a generalized factor transducer structure which retains the time information necessary for performing STD. The required information is embedded into the path weights of the factor transducer without disrupting the inherent optimality. We also describe how to index all substrings seen in a collection of raw automatic speech recognition lattices using the proposed structure. Our STD indexing/search implementation is built upon the OpenFst Library and is designed to scale well to large problems. Experiments on Turkish and English data sets corroborate our claims.

引用

页码：2338 / 2347

页数：10

共 25 条

[1]

ALLAUZEN C, 2004, P WORKSH INT APPR SP, P33

[2]

Allauzen C, 2007, LECT NOTES COMPUT SC, V4783, P11

[3]

[Anonymous], 1997, ACM SIGACT NEWS

[4] Turkish Broadcast News Transcription and Retrieval [J].

Arisoy, Ebru ;

Can, Dogan ;

Parlak, Siddika ;

Sak, Hasim ;

Saraclar, Murat .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05) :874-883

[5] THE SMALLEST AUTOMATION RECOGNIZING THE SUBWORDS OF A TEXT [J].

BLUMER, A ;

BLUMER, J ;

HAUSSLER, D ;

EHRENFEUCHT, A ;

CHEN, MT ;

SEIFERAS, J .

THEORETICAL COMPUTER SCIENCE, 1985, 40 (01) :31-55

[6] COMPLETE INVERTED FILES FOR EFFICIENT TEXT RETRIEVAL AND ANALYSIS [J].

BLUMER, A ;

BLUMER, J ;

HAUSSLER, D ;

MCCONNELL, R ;

EHRENFEUCHT, A .

JOURNAL OF THE ACM, 1987, 34 (03) :578-595

[7] EFFECT OF PRONUNCIATIONS ON OOV QUERIES IN SPOKEN TERM DETECTION [J].

Can, Dogan ;

Cooper, Erica ;

Sethy, Abhinav ;

White, Chris ;

Ramabhadran, Bhuvana ;

Saraclar, Murat .

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, :3957-+

[8] Improvements in phone based audio search via constrained match with high order confusion estimates [J].

Chaudhari, Upendra V. ;

Picheny, Michael .

2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, :665-670

[9]

Chelba C., 2005, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, P443, DOI DOI 10.3115/1219840.1219895

[10] TRANSDUCERS AND REPETITIONS [J].

CROCHEMORE, M .

THEORETICAL COMPUTER SCIENCE, 1986, 45 (01) :63-86

← 1 2 3 →