A study on model-based error rate estimation for automatic speech recognition

被引:7
作者
Huang, CS
Wang, HC
Lee, CH
机构
[1] Asia Pacific, Philips Speech Proc, Taipei 100, Taiwan
[2] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu 300, Taiwan
[3] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2003年 / 11卷 / 06期
关键词
automatic speech recognition (ASR); divergence; error rate estimation; hidden Markov model (HMM); model-based misclassification measure;
D O I
10.1109/TSA.2003.818030
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A model-based framework of classification error rate estimation is proposed for speech and speaker recognition. It aims at predicting the run-time performance of a hidden Markov model (HMM) based recognition system for a given task vocabulary and grammar without the need of running recognition experiments using a separate set of testing samples. This is highly desirable both in theory and in practice. However, the error rate expression in HMM-based speech recognition systems has no closed form solution due to the complexity of the multi-class comparison process and the need for dynamic time warping to handle various speech patterns. To alleviate the difficulty, we propose a one-dimensional model-based misclassification measure to evaluate the distance between a particular model of interest and a combination of many of its competing models. The error rate for a class characterized by the HMM is then the value of a smoothed zero-one error function given the misclassification measure. The overall error rate of the task vocabulary could then be computed as a function of all the available class error rates. The key here is to evaluate the misclassification measure in terms of the parameters of environmental-matched models without running recognition experiments, where the models are adapted by very limited data that could be just the testing utterance itself. In this paper, we show how the misclassification measure could be approximated by first computing the distance between two mixture Gaussian densities, then between two HMMs with mixture Gaussian state observation densities and finally between two sequences of HMMs. The misclassification measure is then converted into classification error rate. When comparing the error rate obtained in running actual experiments and that of the new framework, the proposed algorithm accurately estimates the classification error rate for many types of speech and speaker recognition problems. Based on the same framework, it is also demonstrated that the error rate of a recognition system in a noisy environment could also be predicted.
引用
收藏
页码:581 / 589
页数:9
相关论文
共 30 条
[1]  
[Anonymous], 1996, THESIS CARNEGIE MELL
[2]  
[Anonymous], P INT C SPOK LANG PR
[3]  
Bhattacharyya A., 1943, B CALCUTTA MATH SOC, V35, P99, DOI DOI 10.1038/157869B0
[4]  
Bowyer K., 1998, EMPIRICAL EVALUATION
[5]   A MEASURE OF ASYMPTOTIC EFFICIENCY FOR TESTS OF A HYPOTHESIS BASED ON THE SUM OF OBSERVATIONS [J].
CHERNOFF, H .
ANNALS OF MATHEMATICAL STATISTICS, 1952, 23 (04) :493-507
[6]  
Chou W., 1992, P IEEE ICASSP 92, P473
[7]  
COVER TM, 1969, METHODOLOGIES PATTER, P111
[8]  
COVER TM, 1967, ELEMENTS INFORMATION
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]  
Duda R. O., 2000, Pattern Classification and Scene Analysis, V2nd