Do End-to-End Speech Recognition Models Care About Context?

被引:5
作者
Borgholt, Lasse [1 ,2 ]
Havtorn, Jakob D. [2 ]
Agic, Zeljko [2 ]
Sogaard, Anders [1 ]
Maaloe, Lars [2 ]
Igel, Christian [1 ]
机构
[1] Univ Copenhagen, Dept Comp Sci, Copenhagen, Denmark
[2] Corti, Copenhagen, Denmark
来源
INTERSPEECH 2020 | 2020年
关键词
automatic speech recognition; end-to-end speech recognition; connectionist temporal classification; attentionbased encoder-decoder;
D O I
10.21437/Interspeech.2020-1750
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual information in the audio input. We find that the AED model is indeed more context sensitive, but that the gap can be closed by adding self-attention to the CTC model. Furthermore, the two models perform similarly when contextual information is constrained. Finally, in contrast to previous research, our results show that the CTC model is highly competitive on WSJ and LibriSpeech without the help of an external language model.
引用
收藏
页码:4352 / 4356
页数:5
相关论文
共 36 条
[1]  
Amodei D, 2016, PR MACH LEARN RES, V48
[2]  
[Anonymous], 2017, INT C LEARN REPR ICL
[3]  
[Anonymous], 2015, Advances in Neural Information Processing Systems
[4]  
[Anonymous], 1983, Timit acousticphonetic continuous speech corpus
[5]  
Arras L., 2019, ACL WORKSH BLACK BOX
[6]  
Arras L., 2017, EMNLP WORKSH WORKSH
[7]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[8]  
Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
[9]  
Battenberg E, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P206, DOI 10.1109/ASRU.2017.8268937
[10]  
Bharadhwaj H, 2018, IEEE INT SYMP SIGNAL, P168, DOI 10.1109/ISSPIT.2018.8642691