HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications

被引:14
作者
Frihia H. [1 ,2 ]
Bahi H. [1 ]
机构
[1] Computer Science Department, University Badji Mokhtar, Annaba
[2] LabGED Laboratory, University Badji Mokhtar, Annaba
关键词
Arabic language; HMM; Speech recognition; Speech segmentation; SVM;
D O I
10.1007/s10772-017-9427-z
中图分类号
学科分类号
摘要
Building a large vocabulary continuous speech recognition (LVCSR) system requires a lot of hours of segmented and labelled speech data. Arabic language, as many other low-resourced languages, lacks such data, but the use of automatic segmentation proved to be a good alternative to make these resources available. In this paper, we suggest the combination of hidden Markov models (HMMs) and support vector machines (SVMs) to segment and to label the speech waveform into phoneme units. HMMs generate the sequence of phonemes and their frontiers; the SVM refines the frontiers and corrects the labels. The obtained segmented and labelled units may serve as a training set for speech recognition applications. The HMM/SVM segmentation algorithm is assessed using both the hit rate and the word error rate (WER); the resulting scores were compared to those provided by the manual segmentation and to those provided by the well-known embedded learning algorithm. The results show that the speech recognizer built upon the HMM/SVM segmentation outperforms in terms of WER the one built upon the embedded learning segmentation of about 0.05%, even in noisy background. © 2017, Springer Science+Business Media New York.
引用
收藏
页码:563 / 573
页数:10
相关论文
共 48 条
[1]  
Abdo M.S., Kandil A.H., Semi-automatic segmentation system for syllables extraction from continuous Arabic audio signal, International Journal of Advanced Computer Science and Applications, 7, 1, pp. 535-540, (2016)
[2]  
Amanpreet K., Tarandeep S., Segmentation of Continuous Punjabi Speech Signal into Syllables: WCECS’2010 Proceedings, (2010)
[3]  
Anwar M.J., Awais M.M., Masud S., Shamail S., Automatic Arabic speech segmentation system, International Journal of Information Technology, 12, 6, pp. 102-111, (2006)
[4]  
Awais M.M., Ahmad W., Masud S., Shamail S., Continuous Arabic speech segmentation using FFT spectrogram: Innovations in Information Technology Proceedings, (2006)
[5]  
Bilmes J.A., Buried Markov models: A graphical-modelling approach to automatic speech recognition, Computer Speech and Language, 17, 2-3, pp. 213-231, (2003)
[6]  
Brognaux S., Drugman T., HMM-based speech segmentation: Improvements of fully automatic approaches, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24, 1, pp. 5-15, (2016)
[7]  
Brognaux S., Roekhaut S., Drugman T., & R. Beaufort, R. (2012). Train&Align: A new online tool for automatic phonetic alignments: IEEE Workshop Spoken Lang. Technol. (SLT) Proceedings, Miami, Florida, USA
[8]  
Brugnara F., Falavigna D., Omologo M., Automatic segmentation and labeling of speech based on hidden Markov models, Speech Communication, 12, pp. 357-370, (1993)
[9]  
Clarkson P., Moreno P.J., On the use of support vector machines for phonetic classification: ICASSP’1999 Proceedings, Phoenix, Arizona, USA, (pp. 585–588), (1999)
[10]  
Cortes C., Vapnik V., Support-vector networks, Machine Learning, 20, 3, pp. 273-297, (1995)