A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)

被引:468
作者
Fiscus, JG [1 ]
机构
[1] Natl Inst Stand & Technol, Gaithersburg, MD 20899 USA
来源
1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS | 1997年
关键词
D O I
10.1109/ASRU.1997.659110
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a system developed at NIST to produce a composite Automatic Speech Recognition (ASR) system output when the outputs: of multiple ASR systems are available, and for which, in many cases, the composite ASR output has lower error rate than any of the individual systems. The system implements a "voting" or rescoring process to reconcile differences in ASR system outputs. We refer to this system as the NIST Recognizer Output Voting Error Reduction (ROVER) system. As additional knowledge sources are added to an ASR system, (e.g., acoustic and language models), error rates are typically decreased. This paper describes a post-recognition process which models the output generated by multiple ASR systems as independent knowledge sources that can be combined and used to generate an output with reduced error rate, To accomplish this, the outputs of multiple of ASR systems are combined into a single, minimal cost word transition network (WTN) via iterative applications of dynamic programming (DP) alignments. The resulting network is searched by an automatic rescoring or "voting" process that selects an output sequence with the lowest score.
引用
收藏
页码:347 / 354
页数:8
相关论文
empty
未找到相关数据