Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training

被引:4
作者
Biswas, Astik [1 ]
Menon, Raghav [1 ]
van der Westhuizen, Ewald [1 ]
Niesler, Thomas [1 ]
机构
[1] Stellenbosch Univ, Dept Elect & Elect Engn, Stellenbosch, South Africa
来源
INTERSPEECH 2019 | 2019年
关键词
speech recognition; Somali; semi-supervised; TDNN-F; under-resourced language;
D O I
10.21437/Interspeech.2019-1328
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We present improvements in automatic speech recognition (ASR) for Somali, a currently extremely under-resourced language. This forms part of a continuing United Nations (UN) effort to employ ASR-based keyword spotting systems to support humanitarian relief programmes in rural Africa. Using just 1.57 hours of annotated speech data as a seed corpus, we increase the pool of training data by applying semi-supervised training to 17.55 hours of untranscribed speech. We make use of factorised time-delay neural networks (TDNN-F) for acoustic modelling, since these have recently been shown to be effective in resource-scarce situations. Three semi-supervised training passes were performed, where the decoded output from each pass was used for acoustic model training in the subsequent pass. The automatic transcriptions from the best performing pass were used for language model augmentation. To ensure the quality of automatic transcriptions, decoder confidence is used as a threshold. The acoustic and language models obtained from the semi-supervised approach show significant improvement in terms of WER and perplexity compared to the baseline. Incorporating the automatically generated transcriptions yields a 6.55% improvement in language model perplexity. The use of 17.55 hour of Somali acoustic data in semi-supervised training shows an improvement of 7.74% relative over the baseline.
引用
收藏
页码:3008 / 3012
页数:5
相关论文
共 22 条
[1]  
Addillahi N., 2006, P INTERSPEECH
[2]  
[Anonymous], 2002, P ICSLP
[3]  
Burnap P., 2015, P ACM HT
[4]  
G. P. P. Series, 2015, GLOB PULS PROJ SER
[5]  
G. P. P. Series, 2015, GLOB PULS PROJ SER
[6]  
G. P. P. Series, 2014, GLOB PULS PROJ SER
[7]  
Ghoshal A, 2013, INT CONF ACOUST SPEE, P7319, DOI 10.1109/ICASSP.2013.6639084
[8]  
Goldhahn D, 2012, LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P759
[9]   Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system [J].
Kamper, Herman ;
de Wet, Febe ;
Hain, Thomas ;
Niesler, Thomas .
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (06) :1255-1268
[10]  
Ko T, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3586