Combining Tandem and Hybrid Systems for Improved Speech Recognition and Keyword Spotting on Low Resource Languages

被引:0
作者
Rath, Shakti P. [1 ]
Knill, Kate M. [1 ]
Ragni, Anton [1 ]
Gales, Mark J. E. [1 ]
机构
[1] Univ Cambridge, Dept Engn, Trumpington St, Cambridge CB2 1PZ, England
来源
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年
关键词
keyword spotting; deep neural network; Tandem; Hybrid;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years there has been significant interest in Automatic Speech Recognition (ASR) and Key Word Spotting (KWS) systems for low resource languages. One of the driving forces for this research direction is the IARPA Babel project. This paper examines the performance gains that can be obtained by combining two forms of deep neural network ASR systems, Tandem and Hybrid, for both ASR and KWS using data released under the Babel project. Baseline systems are described for the five option period 1 languages: Assamese; Bengali; Haitian Creole; Lao; and Zulu. All the ASR systems share common attributes, for example deep neural network configurations, and decision trees based on rich phonetic questions and state-position root nodes. The baseline ASR and KWS performance of Hybrid and Tandem systems are compared for both the "full", approximately 80 hours of training data, and limited, approximately 10 hours of training data, language packs. By combining the two systems together consistent performance gains can be obtained for KWS in all configurations.
引用
收藏
页码:835 / 839
页数:5
相关论文
共 30 条
[1]  
[Anonymous], P ICASSP
[2]  
[Anonymous], P INT
[3]  
[Anonymous], 2009, HTK BOOK HTK VERSION
[4]  
[Anonymous], P INT
[5]  
[Anonymous], P ASRU
[6]  
Bourlard H.A., 1993, Connectionist Speech Recognition: A Hybrid Approach, DOI 10.1007/978-1-4615-3210-1
[7]  
Cui X., 2014, P ICASSP
[8]  
Evermann G., 2000, P ICASSP 2000
[9]  
Fiscus J., 2007, P ACM SIGIR WORKSH S
[10]   Progress in the CU-HTK broadcast news transcription system [J].
Gales, Mark J. F. ;
Kim, Do Yeong ;
Woodland, Philip C. ;
Chan, Ho Yin ;
Mrva, David ;
Sinha, Rohit ;
Tranter, Sue E. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05) :1513-1525