Leveraging Deep Learning for Fine-Grained Categorization of Parkinson's Disease Progression Levels through Analysis of Vocal Acoustic Patterns

被引:13
作者
Malekroodi, Hadi Sedigh [1 ]
Madusanka, Nuwan [2 ]
Lee, Byeong-il [1 ,2 ,3 ]
Yi, Myunggi [1 ,2 ,3 ]
机构
[1] Pukyong Natl Univ, Ind 4 0 Convergence Bion Engn, Busan 48513, South Korea
[2] Pukyong Natl Univ, Inst Informat Technol & Convergence, Digital Healthcare Res Ctr, Busan 48513, South Korea
[3] Pukyong Natl Univ, Div Smart Healthcare, Busan 48513, South Korea
来源
BIOENGINEERING-BASEL | 2024年 / 11卷 / 03期
关键词
Parkinson's disease (PD); deep learning; transfer learning; speech analysis; mel spectrogram;
D O I
10.3390/bioengineering11030295
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Speech impairments often emerge as one of the primary indicators of Parkinson's disease (PD), albeit not readily apparent in its early stages. While previous studies focused predominantly on binary PD detection, this research explored the use of deep learning models to automatically classify sustained vowel recordings into healthy controls, mild PD, or severe PD based on motor symptom severity scores. Popular convolutional neural network (CNN) architectures, VGG and ResNet, as well as vision transformers, Swin, were fine-tuned on log mel spectrogram image representations of the segmented voice data. Furthermore, the research investigated the effects of audio segment lengths and specific vowel sounds on the performance of these models. The findings indicated that implementing longer segments yielded better performance. The models showed strong capability in distinguishing PD from healthy subjects, achieving over 95% precision. However, reliably discriminating between mild and severe PD cases remained challenging. The VGG16 achieved the best overall classification performance with 91.8% accuracy and the largest area under the ROC curve. Furthermore, focusing analysis on the vowel /u/ could further improve accuracy to 96%. Applying visualization techniques like Grad-CAM also highlighted how CNN models focused on localized spectrogram regions while transformers attended to more widespread patterns. Overall, this work showed the potential of deep learning for non-invasive screening and monitoring of PD progression from voice recordings, but larger multi-class labeled datasets are needed to further improve severity classification.
引用
收藏
页数:23
相关论文
共 57 条
[1]  
Atliha V., 2020, P 2020 IEEE OPEN C E
[2]  
Aversano L., 2022, P 2022 IEEE INT C EV, P1
[3]   Analysis of multiple types of voice recordings in cepstral domain using MFCC for discriminating between patients with Parkinson's disease and healthy people [J].
Benba, Achraf ;
Jilbab, Abdelilah ;
Hammouch, Ahmed .
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (03) :449-456
[4]   Assessment of Speech Intelligibility in Parkinson's Disease Using a Speech-To-Text System [J].
Dimauro, Giovanni ;
Di Nicola, Vincenzo ;
Bevilacqua, Vitoantonio ;
Caivano, Danilo ;
Girardi, Francesco .
IEEE ACCESS, 2017, 5 :22199-22208
[5]  
Dimauro G, 2016, IEEE INT SYM MED MEA, P352
[6]  
Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[7]   CNN-Based Identification of Parkinson's Disease from Continuous Speech in Noisy Environments [J].
Farago, Paul ;
Stefaniga, Sebastian-Aurelian ;
Cordos, Claudia-Georgiana ;
Mihaila, Laura-Ioana ;
Hintea, Sorin ;
Pestean, Ana-Sorina ;
Beyer, Michel ;
Perju-Dumbrava, Lacramioara ;
Ilesan, Robert Radu .
BIOENGINEERING-BASEL, 2023, 10 (05)
[8]  
Govindu Aditi, 2023, Procedia Computer Science, P249, DOI 10.1016/j.procs.2023.01.007
[9]   Array programming with NumPy [J].
Harris, Charles R. ;
Millman, K. Jarrod ;
van der Walt, Stefan J. ;
Gommers, Ralf ;
Virtanen, Pauli ;
Cournapeau, David ;
Wieser, Eric ;
Taylor, Julian ;
Berg, Sebastian ;
Smith, Nathaniel J. ;
Kern, Robert ;
Picus, Matti ;
Hoyer, Stephan ;
van Kerkwijk, Marten H. ;
Brett, Matthew ;
Haldane, Allan ;
del Rio, Jaime Fernandez ;
Wiebe, Mark ;
Peterson, Pearu ;
Gerard-Marchant, Pierre ;
Sheppard, Kevin ;
Reddy, Tyler ;
Weckesser, Warren ;
Abbasi, Hameer ;
Gohlke, Christoph ;
Oliphant, Travis E. .
NATURE, 2020, 585 (7825) :357-362
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778