Detection of Speech Impairments Using Cepstrum, Auditory Spectrogram and Wavelet Time Scattering Domain Features

被引:43
作者
Lauraitis, Andrius [1 ]
Maskeliunas, Rytis [2 ]
Damasevicius, Robertas [2 ,3 ]
Krilavicius, Tomas [2 ,4 ]
机构
[1] Kaunas Univ Technol, Dept Multimedia Engn, LT-44249 Kaunas, Lithuania
[2] Vytautas Magnus Univ, Dept Appl Informat, LT-44404 Kaunas, Lithuania
[3] Silesian Tech Univ, Fac Appl Math, PL-44100 Gliwice, Poland
[4] Baltic Inst Adv Technol, LT-01403 Vilnius, Lithuania
关键词
Neural impairment; mobile app; deep learning; wavelet scattering; decision support; speech processing; digital health; Internet of Medical Things; AUTOMATIC ASSESSMENT; PARKINSONS-DISEASE; HUNTINGTON DISEASE; INTELLIGIBILITY; CLASSIFICATION; PITCH; INDIVIDUALS; RECOGNITION; REVERBERANT; QUALITY;
D O I
10.1109/ACCESS.2020.2995737
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We adopt Bidirectional Long Short-Term Memory (BiLSTM) neural network and Wavelet Scattering Transform with Support Vector Machine (WST-SVM) classifier for detecting speech impairments of patients at the early stage of central nervous system disorders (CNSD). The study includes 339 voice samples collected from 15 subjects: 7 patients with early stage CNSD (3 Huntington, 1 Parkinson, 1 cerebral palsy, 1 post stroke, 1 early dementia), other 8 subjects were healthy. Speech data is collected using voice recorder from Neural Impairment Test Suite (NITS) mobile app. Features are extracted from pitch contours, Mel-frequency cepstral coefficients (MFCC), Gammatone cepstral coefficients (GTCC), Gabor (analytic Morlet) wavelet and auditory spectrograms. 94.50 & x0025; (BiLSTM) and 96.3 & x0025; (WST-SVM) accuracy is achieved for solving healthy vs. impaired classification problem. The developed method can be applied for automated CNSD patient health state monitoring and clinical decision support systems as well as a part of Internet of Medical Things (IoMT).
引用
收藏
页码:96162 / 96172
页数:11
相关论文
共 65 条
[1]   Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram [J].
Ajmera, Pawan K. ;
Jadhav, Dattatray V. ;
Holambe, Raghunath S. .
PATTERN RECOGNITION, 2011, 44 (10-11) :2749-2759
[2]   Detecting Parkinson's disease with sustained phonation and speech signals using machine learning techniques [J].
Almeida, Jefferson S. ;
Reboucas Filho, Pedro R. ;
Carneiro, Tiago ;
Wei, Wei ;
Damasevicius, Robertas ;
Maskeliunas, Rytis ;
de Albuquerque, Victor Hugo C. .
PATTERN RECOGNITION LETTERS, 2019, 125 :55-62
[3]   Deep Scattering Spectrum [J].
Anden, Joakim ;
Mallat, Stephane .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (16) :4114-4128
[4]   AUTOMATIC SPEAKER RECOGNITION BASED ON PITCH CONTOURS [J].
ATAL, BS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1972, 52 (06) :1687-1697
[5]  
Aziz S, 2019, TECH J, V24, P90
[6]   The Effect of Parkinson Disease Tremor Phenotype on Cepstral Peak Prominence and Transglottal Airflow in Vowels and Speech [J].
Burk, Brittany R. ;
Watts, Christopher R. .
JOURNAL OF VOICE, 2019, 33 (04) :580.e11-580.e19
[7]  
Caesarendra W, 2015, IEEE ASME INT C ADV, P802, DOI 10.1109/AIM.2015.7222636
[8]   Predicting the intelligibility of reverberant speech for cochlear implant listeners with a non-intrusive intelligibility measure [J].
Chen, Fei ;
Hazrati, Oldooz ;
Loizou, Philipos C. .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2013, 8 (03) :311-314
[9]  
Drugman T, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P1984
[10]  
Essenwanger O.M., 1986, Elements of Statistical Analysis