Experiments in spoken document retrieval using phoneme n-grams

被引：10

作者：

Ng, C

Wilkinson, R

Zobel, J

机构：

[1] RMIT Univ, Dept Comp Sci, Melbourne, Vic 3001, Australia

[2] CSIRO, Div Math & Informat Sci, Melbourne, Vic 3053, Australia

来源：

SPEECH COMMUNICATION | 2000年 / 32卷 / 1-2期

基金：

澳大利亚研究理事会;

关键词：

D O I：

10.1016/S0167-6393(00)00024-8

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In spoken document retrieval (SDR), speech recognition is applied to a collection to obtain either words or subword units, such as phonemes, that can be matched against queries. We have explored retrieval based on phoneme n-grams. The use of phonemes addresses the out-of-vocabulary (OOV) problem, while use of n-grams allows approximate matching on inaccurate phoneme transcriptions. Our experiments explored the utility of word boundary information, stopword elimination, query expansion, varying the length of phoneme sequences to be matched and various combinations of n-grams of different lengths. Given word-based recognition (WBR), we can match queries to speech using a phoneme representation of the words, permitting us to test whether it was the recognition or the matching process that was most crucial to retrieval performance. Our experiments show that there is some deterioration in effectiveness, but the particular form of matching is less vital if the sequence of phonemes was correct. When phone sequences are recognised directly, with higher error rates than for words, it was more important to select a good matching approach. Varying gram length trades precision against recall; combination of n-grams of different lengths, in particular 3-grams and 4-grams, can improve retrieval. Overall, phoneme-based retrieval is not as effective as word-based retrieval, but is sufficient for situations in which word-based retrieval is either impractical or undesirable. (C) 2000 Elsevier Science B.V. All rights reserved.

引用

页码：61 / 77

页数：17

共 28 条

[11] *LING DAT CONS, 1996, CSRVHUB4 CDROM
[12] MATEEV B, 1997, P 6 TEXT RETR C TREC, P623
[13] Ng K, 1998, INT CONF ACOUST SPEE, P325, DOI 10.1109/ICASSP.1998.674433
[14] NG K, 1998, P INT C SPOK LANG PR, V3, P939
[15] NG K, 1997, P EUR C SPEECH COMM, P1607
[16] Rabiner L., 1993, Fundamentals of Speech Recognition
[17] TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL
SALTON, G
BUCKLEY, C
[J]. INFORMATION PROCESSING & MANAGEMENT, 1988, 24 (05) : 513 - 523
[18] SALTON G, 1983, INTRO MODERN INFORMA
[19] SMEATON AF, 1998, P 2 EUR C RES ADV TE
[20] VOORHEES E, 1997, P 6 TEXT RETR C TREC, P1

← 1 2 3 →