SEMI-SUPERVISED TRAINING IN LOW-RESOURCE ASR AND KWS

被引:0
作者
Metze, Florian [1 ,2 ]
Gandhe, Ankur [1 ,2 ]
Miao, Yajie [1 ,2 ]
Sheikh, Zaid [1 ,2 ]
Wang, Yun [1 ,2 ]
Xu, Di [1 ,2 ]
Zhang, Hao [1 ,2 ]
Kim, Jungsuk [3 ,4 ]
Lane, Ian [3 ,4 ]
Lee, Won Kyum [3 ,4 ]
Stueker, Sebastian [5 ]
Mueller, Markus [5 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Language Technol Inst, Moffett Field, CA USA
[3] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[4] Carnegie Mellon Univ, Dept Elect & Comp Engn, Moffett Field, CA USA
[5] Karlsruhe Inst Technol, Karlsruhe, Germany
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
基金
美国国家科学基金会;
关键词
spoken term detection; automatic speech recognition; low-resource LTs; semi-supervised training; RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In particular for "low resource" Keyword Search (KWS) and Speech-to-Text (STT) tasks, more untranscribed test data may be available than training data. Several approaches have been proposed to make this data useful during system development, even when initial systems have Word Error Rates (WER) above 70%. In this paper, we present a set of experiments on low-resource languages in telephony speech quality in Assamese, Bengali, Lao, Haitian, Zulu, and Tamil, demonstrating the impact that such techniques can have, in particular learning robust bottle-neck features on the test data. In the case of Tamil, when significantly more test data than training data is available, we integrated semi-supervised training and speaker adaptation on the test data, and achieved significant additional improvements in STT and KWS.
引用
收藏
页码:4699 / 4703
页数:5
相关论文
共 33 条
[1]  
Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
[2]  
Anguera X., 2014, P MEDIAEVAL WORKSH B
[3]  
[Anonymous], P IEEE WORKSH SPOK L
[4]  
[Anonymous], 2007, ACM Transactions on Speech and Language Processing (TSLP), DOI DOI 10.1145/1322391.1322392
[5]   Joint-sequence models for grapheme-to-phoneme conversion [J].
Bisani, Maximilian ;
Ney, Hermann .
SPEECH COMMUNICATION, 2008, 50 (05) :434-451
[6]  
Creutz M., 2005, P AKRR ESP FINL JUN
[7]  
Fox E. A., 1994, Second Text REtrieval Conference (TREC-2) (NIST-SP 500-215), P243
[8]  
Gales M. J. F., 1997, CUEDFINFENGTR291 CAM
[9]  
Gehring J., 2013, P ICASSP
[10]  
Ghahremani Pegah, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P2494, DOI 10.1109/ICASSP.2014.6854049