TED-LIUM 3: Twice as Much Data and Corpus Repartition for Experiments on Speaker Adaptation

被引:179
作者
Hernandez, Francois [1 ]
Nguyen, Vincent [1 ]
Ghannay, Sahar [2 ]
Tomashenko, Natalia [2 ]
Esteve, Yannick [2 ]
机构
[1] Ubiqus, Paris, France
[2] Univ Le Mans, LIUM, Le Mans, France
来源
SPEECH AND COMPUTER (SPECOM 2018) | 2018年 / 11096卷
关键词
Speech recognition; Opensource corpus; Deep learning; Speaker adaptation; TED-LIUM;
D O I
10.1007/978-3-319-99579-3_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present TED-LIUM release 3 corpus (TED-LIUM 3 is available on https://lium.univ-lemans.fr/ted-lium3/) dedicated to speech recognition in English, which multiplies the available data to train acoustic models in comparison with TED-LIUM 2, by a factor of more than two. We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014. We demonstrate that, passing from 207 to 452 h of transcribed speech training data is really more useful for end-to-end ASR systems than for HMM-based state-of-the-art ones. This is the case even if the HMM-based ASR system still outperforms the end-to-end ASR system when the size of audio training data is 452 h, with aWord Error Rate (WER) of 6.7% and 13.7%, respectively. Finally, we propose two repartitions of the TED-LIUM release 3 corpus: the legacy repartition that is the same as that existing in release 2, and a new repartition, calibrated and designed to make experiments on speaker adaptation. Similar to the two first releases, TED-LIUM 3 corpus will be freely available for the research community.
引用
收藏
页码:198 / 208
页数:11
相关论文
共 14 条
[1]  
Amodei D, 2016, PR MACH LEARN RES, V48
[2]  
[Anonymous], 2011, P IEEE WORKSH AUT SP
[3]  
Graves A., 2006, PROC INT C MACH LEAR, P369, DOI DOI 10.1145/1143844.1143891
[4]  
Hannun Awni Y, 2014, ARXIV14082873
[5]  
Ko T, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3586
[6]   Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs [J].
Peddinti, Vijayaditya ;
Wang, Yiming ;
Povey, Daniel ;
Khudanpur, Sanjeev .
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (03) :373-377
[7]  
Peddinti V, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3214
[8]  
Povey D., 2018, INTERSPEECH UNPUB
[9]   Purely sequence-trained neural networks for ASR based on lattice-free MMI [J].
Povey, Daniel ;
Peddinti, Vijayaditya ;
Galvez, Daniel ;
Ghahremani, Pegah ;
Manohar, Vimal ;
Na, Xingyu ;
Wang, Yiming ;
Khudanpur, Sanjeev .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :2751-2755
[10]  
Rousseau A, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3935