Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network

被引：0

作者：

Mohammed Sidi Yakoub

Sid-ahmed Selouani

Brahim-Fares Zaidi

Asma Bouchair

机构：

[1] Moncton University,LSCSP Research Laboratory

[2] UMCS Shippagan,LCPTS Research Laboratory

[3] USTHB University,undefined

[4] USTHB University,undefined

来源：

EURASIP Journal on Audio, Speech, and Music Processing | / 2020卷

关键词：

Dysarthria; Empirical mode decomposition; Hurst mode selection; Convolutional neural network;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper, we use empirical mode decomposition and Hurst-based mode selection (EMDH) along with deep learning architecture using a convolutional neural network (CNN) to improve the recognition of dysarthric speech. The EMDH speech enhancement technique is used as a preprocessing step to improve the quality of dysarthric speech. Then, the Mel-frequency cepstral coefficients are extracted from the speech processed by EMDH to be used as input features to a CNN-based recognizer. The effectiveness of the proposed EMDH-CNN approach is demonstrated by the results obtained on the Nemours corpus of dysarthric speech. Compared to baseline systems that use Hidden Markov with Gaussian Mixture Models (HMM-GMMs) and a CNN without an enhancement module, the EMDH-CNN system increases the overall accuracy by 20.72% and 9.95%, respectively, using a k-fold cross-validation experimental setup.

引用

共 13 条

[1]

Polur P. D.(2006)Investigation of an hmm/ann hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals Med. Eng. Phys. 28 741-748

[2]

Miller G. E.(2017)A relationship between processing speech in noise and dysarthric speech J. Acoust. Soc. Am. 141 4660-4667

[3]

Borrie S. A.(1951)Long-term storage capacity of reservoirs Trans. Amer. Soc. Civil Eng. 116 770-799

[4]

Baese-Berk M.(2014)Speech enhancement with EMD and Hurst-based mode selection IEEE/ACM Trans. Audio Speech Lang. Process. 22 899-911

[5]

Van Engen K.(2006)Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional brownian motion model IEEE Trans. Audio Speech Lang. Process. 14 931-940

[6]

Bent T.(undefined)undefined undefined undefined undefined-undefined

[7]

Hurst H. E.(undefined)undefined undefined undefined undefined-undefined

[8]

Zao L.(undefined)undefined undefined undefined undefined-undefined

[9]

Coelho R.(undefined)undefined undefined undefined undefined-undefined

[10]

Flandrin P.(undefined)undefined undefined undefined undefined-undefined

← 1 2 →