Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network

被引:0
作者
Mohammed Sidi Yakoub
Sid-ahmed Selouani
Brahim-Fares Zaidi
Asma Bouchair
机构
[1] Moncton University,LSCSP Research Laboratory
[2] UMCS Shippagan,LCPTS Research Laboratory
[3] USTHB University,undefined
[4] USTHB University,undefined
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2020卷
关键词
Dysarthria; Empirical mode decomposition; Hurst mode selection; Convolutional neural network;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we use empirical mode decomposition and Hurst-based mode selection (EMDH) along with deep learning architecture using a convolutional neural network (CNN) to improve the recognition of dysarthric speech. The EMDH speech enhancement technique is used as a preprocessing step to improve the quality of dysarthric speech. Then, the Mel-frequency cepstral coefficients are extracted from the speech processed by EMDH to be used as input features to a CNN-based recognizer. The effectiveness of the proposed EMDH-CNN approach is demonstrated by the results obtained on the Nemours corpus of dysarthric speech. Compared to baseline systems that use Hidden Markov with Gaussian Mixture Models (HMM-GMMs) and a CNN without an enhancement module, the EMDH-CNN system increases the overall accuracy by 20.72% and 9.95%, respectively, using a k-fold cross-validation experimental setup.
引用
收藏
相关论文
共 13 条
[1]  
Polur P. D.(2006)Investigation of an hmm/ann hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals Med. Eng. Phys. 28 741-748
[2]  
Miller G. E.(2017)A relationship between processing speech in noise and dysarthric speech J. Acoust. Soc. Am. 141 4660-4667
[3]  
Borrie S. A.(1951)Long-term storage capacity of reservoirs Trans. Amer. Soc. Civil Eng. 116 770-799
[4]  
Baese-Berk M.(2014)Speech enhancement with EMD and Hurst-based mode selection IEEE/ACM Trans. Audio Speech Lang. Process. 22 899-911
[5]  
Van Engen K.(2006)Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional brownian motion model IEEE Trans. Audio Speech Lang. Process. 14 931-940
[6]  
Bent T.(undefined)undefined undefined undefined undefined-undefined
[7]  
Hurst H. E.(undefined)undefined undefined undefined undefined-undefined
[8]  
Zao L.(undefined)undefined undefined undefined undefined-undefined
[9]  
Coelho R.(undefined)undefined undefined undefined undefined-undefined
[10]  
Flandrin P.(undefined)undefined undefined undefined undefined-undefined