Utterance verification in continuous speech recognition: Decoding and training procedures

被引：37

作者：

Lleida, E ^{[1
]}

Rose, RC ^{[1
]}

机构：

[1] Univ Zaragoza, Ctr Politecn Super, Zaragoza, Spain

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2000年 / 8卷 / 02期

关键词：

acoustic modeling; confidence measures; discriminative training; large vocabulary continuous speech recognition; likelihood ratio; utterance verification;

D O I：

10.1109/89.824697

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper introduces a set of acoustic modeling and decoding techniques for utterance verication (UV) in hidden Markov model (HMM) based continuous speech recognition (CSR), Utterance verification in this work implies the ability to determine when portions of a hypothesized word string correspond to incorrectly decoded vocabulary words or out-of-vocabulary words that may appear in an utterance. This capability is implemented here as a likelihood ratio (LR) based hypothesis testing procedure for verifying individual words in a decoded string. There are two UV techniques that are presented here. The first is a procedure for estimating the parameters of UV models during training according to an optimization criterion which is directly related to the LR measure used in UV, The second technique is a speech recognition decoding procedure where the "best" decoded path is defined to be that which optimizes a LR criterion. These techniques were evaluated in terms of their ability to improve UV performance on a speech dialog task over the public smirched telephone network. The results of an experimental study presented in the paper shows that LR based parameter estimation results in a significant improvement in UV performance for this task. The study also found that the use of the LR based decoding procedure, when used in conjunction with models trained using the LR criterion, can provide as much as an 11% improvement in UV performance when compared to existing UV procedures. Finally, it was also found that the performance of the LR decoder was highly dependent on the use of the LR criterion in training acoustic models. Several observations are made in the paper concerning the formation of confidence measures For UV and the interaction of these techniques with statistical language models used in ASR.

引用

页码：126 / 139

页数：14

共 21 条

[1]

BOITE JM, 1993, P EUR C SPEECH COMM

[2] ALPHA-NETS - A RECURRENT NEURAL NETWORK ARCHITECTURE WITH A HIDDEN MARKOV MODEL INTERPRETATION [J].