Deep convolutional neural network for detection of pathological speech

被引：11

作者：

Vavrek, Lukas ^{[1
]}

Hires, Mate ^{[1
]}

Kumar, Dinesh ^{[2
]}

Drotar, Peter ^{[1
]}

机构：

[1] Tech Univ Kosice, Dept Comp & Informat, Fac Elect Engn & Informat, Kosice, Slovakia

[2] RMIT Univ, Sch Engn, Melbourne, Vic, Australia

来源：

2021 IEEE 19TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2021) | 2021年

关键词：

convolutional neural network; deep learning; pathological voice detection; transfer learning;

D O I：

10.1109/SAMI50585.2021.9378656

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes the investigation of the use of the deep neural networks (DNN) for the detection of pathological speech. The state-of-the-art VGG16 convolutional neural network based transfer learning was the basis of this work and different approaches were trialed. We tested the different architectures using the Saarbrucken Voice database (SVD). To overcome limitations due to language and education, the SVD was limited to /a/, /i/ and /u/ vowel subsets with sustained natural pitch. The scope of this study was only diseases that classify as organic dysphonia. We utilized multiple simple networks trained separately on different vowel subsets and combined them as a single model ensemble. It was found that model ensemble achieved an accuracy on pathological speech detection of 82%. Thus, our results show that pre-trained convolutional neural networks can be used for transfer learning when input is the spectrogram representation of the voice signal. This is significant because it overcomes the need for very large data size that is required to train DNN, and is suitable for computerized analysis of the speech without limitation of the language skills of the patients.

引用

页码：245 / 249

页数：5

共 18 条

[1] Multi-channel spectrograms for speech processing applications using deep learning methods [J].

Arias-Vergara, T. ;

Klumpp, P. ;

Vasquez-Correa, J. C. ;

Noeth, E. ;

Orozco-Arroyave, J. R. ;

Schuster, M. .

PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (02) :423-431

[2] Machine Learning Approach to Dysphonia Detection [J].

Dankovicova, Zuzana ;

Sovak, David ;

Drotar, Peter ;

Vokorokos, Liberios .

APPLIED SCIENCES-BASEL, 2018, 8 (10)

[3] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[4]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[5]

Drotar P, 2015, IEEE INT SYM MED MEA, P344, DOI 10.1109/MeMeA.2015.7145225

[6]

Islam KA, 2018, IEEE INT CONF BIG DA, P5252, DOI 10.1109/BigData.2018.8622447

[7]

Khojasteh P, 2018, 2018 IEEE LIFE SCIENCES CONFERENCE (LSC), P187, DOI 10.1109/LSC.2018.8572136

[8]

Martínez D, 2012, COMM COM INF SC, V328, P99

[9] Robust and complex approach of pathological speech signal analysis [J].

Mekyska, Jiri ;

Janousova, Eva ;

Gomez-Vilda, Pedro ;

Smekal, Zdenek ;

Rektorova, Irena ;

Eliasova, Ilona ;

Kostalova, Milena ;

Mrackova, Martina ;

Alonso-Hernandez, Jesus B. ;

Faundez-Zanuy, Marcos ;

Lopez-de-Ipina, Karmele .

NEUROCOMPUTING, 2015, 167 :94-111

[10]

Pishger M, 2018, IEEE INT CONF BIG DA, P5267, DOI 10.1109/BigData.2018.8622208

← 1 2 →