Query-by-Example Spoken Term Detection For OOV Terms

被引：40

作者：

Parada, Carolina ^{[1
]}

Sethy, Abhinav ^{[2
]}

Ramabhadran, Bhuvana ^{[2
]}

机构：

[1] Johns Hopkins Univ, Ctr Language & Speech Proc, 3400 N Charles St, Baltimore, MD 21210 USA

[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10568 USA

来源：

2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009) | 2009年

关键词：

D O I：

10.1109/ASRU.2009.5373341

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The goal of Spoken Term Detection (STD) technology is to allow open vocabulary search over large collections of speech content. In this paper, we address cases where search term(s) of interest (queries) are acoustic examples. This is provided either by identifying a region of interest in a speech stream or by speaking the query term. Queries often relate to named-entities and foreign words, which typically have poor coverage in the vocabulary of Large Vocabulary Continuous Speech Recognition (LVCSR) systems. Throughout this paper, we focus on query-by-example search for such out-of-vocabulary (OOV) query terms. We build upon a finite state transducer (FST) based search and indexing system [1] to address the query by example search for OOV terms by representing both the query and the index as phonetic lattices from the output of an LVCSR system. We provide results comparing different representations and generation mechanisms for both queries and indexes built with word and combined word and subword units [2]. We also present a two-pass method which uses query-by-example search using the best hit identified in an initial pass to augment the STD search results. The results demonstrate that query-by-example search can yield a significantly better performance, measured using Actual Term-Weighted Value (ATWV), of 0.479 when compared to a baseline ATWV of 0.325 that uses reference pronunciations for OOVs. Further improvements can be obtained with the proposed two pass approach and filtering using the expected unigram counts from the LVCSR system's lexicon.

引用

页码：404 / +

页数：2

共 18 条

[1]

ALLAUZEN C, 2004, P WORKSH INT APPR SP, P33

[2]

Allauzen C, 2007, LECT NOTES COMPUT SC, V4783, P11

[3]

CAN D, 2009, AC SPEECH SIGN PROC, P3957

[4]

CHEN SF, 2003, EUROSPEECH, P2033

[5]

Chia T.K., 2008, Proceedings of the International Conference on Research and Development in Information Retrieval, P363, DOI DOI 10.1145/1390334.1390397

[6]

COOPER E, 2009, SIGIR

[7]

MAMOU J, 2007, SIGIR, P615

[8]

MILLER D, 2007, INTERSPEECH

[9]

Parlak S, 2008, ICASSP

[10]

Rastrow A., 2009, INTERSPEECH

← 1 2 →