Progress in the CU-HTK broadcast news transcription system

被引:52
作者
Gales, Mark J. F. [1 ]
Kim, Do Yeong
Woodland, Philip C.
Chan, Ho Yin
Mrva, David
Sinha, Rohit
Tranter, Sue E.
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
[2] VoiceSignal, Woburn, MA 01801 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 05期
关键词
automatic speech recognition; broadcast news (BN) transcription; diarization;
D O I
10.1109/TASL.2006.878264
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Broadcast news (BN) transcription has been a challenging research area for many years. In the last couple of years, the availability of large amounts of roughly transcribed acoustic training data and advanced model training techniques has offered the opportunity 6 greatly reduce the error rate on this task. This paper describes the design and performance of BN transcription systems which make use of these developments. First, the effects of using lightly supervised training data and advanced acoustic modeling techniques are discussed. The design of a real-time broadcast news recognition system is then detailed using these new models. As system combination has been found to yield large gains in performance, a range of frameworks that allow multiple recognition outputs to be combined are next described. These include the use of multiple types of acoustic models and multiple segmentations. As a contrast a system developed by multiple sites allowing cross-site combination, the "SuperEARS" system, is also described. The various models and recognition configurations are evaluated using several recent BN development and evaluation test sets. These new BN transcription systems can give gains of over 25% relative to the CU-HTK 2003 BN system.
引用
收藏
页码:1513 / 1525
页数:13
相关论文
共 47 条
[1]  
Chan HY, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P737
[2]  
CHAN HY, 2004, THESIS CAMBRIDGE U C
[3]  
DUTA N, 2003, P EARS STT WORKS DEC
[4]   Design of fast LVCSR systems [J].
Evermann, G ;
Woodland, PC .
ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, :7-12
[5]  
EVERMANN G, 2000, P SPEECH TRANSCR MAY
[6]   A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER) [J].
Fiscus, JG .
1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, :347-354
[7]   Maximum likelihood linear transformations for HMM-based speech recognition [J].
Gales, MJF .
COMPUTER SPEECH AND LANGUAGE, 1998, 12 (02) :75-98
[8]  
Gales MJF, 2005, INT CONF ACOUST SPEE, P841
[9]   Semi-tied covariance matrices for hidden Markov models [J].
Gales, MJF .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (03) :272-281
[10]   Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].
Gauvain, Jean-Luc ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298