DOMAIN ADAPTATION OF END-TO-END SPEECH RECOGNITION IN LOW-RESOURCE SETTINGS

被引:0
作者
Samarakoon, Lahiru [1 ]
Mak, Brian [2 ]
Lam, Albert Y. S. [1 ]
机构
[1] Fano Labs, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
来源
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018) | 2018年
关键词
ASR; end to end speech recognition; domain adaptation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end automatic speech recognition (ASR) has simplified the traditional ASR system building pipeline by eliminating the need to have multiple components and also the requirement for expert linguistic knowledge for creating pronunciation dictionaries. Therefore, end-to-end ASR fits well when building systems for new domains. However, one major drawback of end-to-end ASR is that, it is necessary to have a larger amount of labeled speech in comparison to traditional methods. Therefore, in this paper, we explore domain adaptation approaches for end-to-end ASR in low-resource settings. We show that joint domain identification and speech recognition by inserting a symbol for domain at the beginning of the label sequence, factorized hidden layer adaptation and a domain-specific gating mechanism improve the performance for a low-resource target domain. Furthermore, we also show the robustness of proposed adaptation methods to an unseen domain, when only 3 hours of untranscribed data is available with improvements reporting upto 8.7% relative.
引用
收藏
页码:382 / 388
页数:7
相关论文
共 33 条
[1]  
Abdel-Hamid O, 2013, INT CONF ACOUST SPEE, P7942, DOI 10.1109/ICASSP.2013.6639211
[2]  
[Anonymous], ASRU
[3]  
[Anonymous], ICASSP
[4]  
[Anonymous], IEEE ACM T AUDIO SPE
[5]  
[Anonymous], ARXIV170706265
[6]  
Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
[7]  
Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
[8]  
Chorowski J, 2014, ARXIV14121602
[9]   Cluster adaptive training of hidden Markov models [J].
Gales, MJF .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04) :417-428
[10]  
Ghahremani Pegah, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P2494, DOI 10.1109/ICASSP.2014.6854049