Multimodal deep learning for dementia classification using text and audio

被引：1

作者：

Lin, Kaiying ^{[1
,2
]}

Washington, Peter Y. ^{[1
]}

机构：

[1] Univ Hawaii, Dept Informat & Comp Sci, Honolulu, HI 96822 USA

[2] Univ Hawaii, Dept Linguist, Honolulu, HI 96822 USA

来源：

SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期

基金：

美国国家科学基金会;

关键词：

ALZHEIMERS-DISEASE;

D O I：

10.1038/s41598-024-64438-1

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Dementia is a progressive neurological disorder that affects the daily lives of older adults, impacting their verbal communication and cognitive function. Early diagnosis is important to enhance the lifespan and quality of life for affected individuals. Despite its importance, diagnosing dementia is a complex process. Automated machine learning solutions involving multiple types of data have the potential to improve the process of automated dementia screening. In this study, we build deep learning models to classify dementia cases from controls using the Pitt Cookie Theft dataset from DementiaBank, a database of short participant responses to the structured task of describing a picture of a cookie theft. We fine-tune Wav2vec and Word2vec baseline models to make binary predictions of dementia from audio recordings and text transcripts, respectively. We conduct experiments with four versions of the dataset: (1) the original data, (2) the data with short sentences removed, (3) text-based augmentation of the original data, and (4) text-based augmentation of the data with short sentences removed. Our results indicate that synonym-based text data augmentation generally enhances the performance of models that incorporate the text modality. Without data augmentation, models using the text modality achieve around 60% accuracy and 70% AUROC scores, and with data augmentation, the models achieve around 80% accuracy and 90% AUROC scores. We do not observe significant improvements in performance with the addition of audio or timestamp information into the model. We include a qualitative error analysis of the sentences that are misclassified under each study condition. This study provides preliminary insights into the effects of both text-based data augmentation and multimodal deep learning for automated dementia classification.

引用

页数：10

共 35 条

[1] [Anonymous], 2015, Keras
[2] Diagnosis and Management of Dementia: Review
Arvanitakis, Zoe
Shah, Raj C.
Bennett, David A.
[J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2019, 322 (16): : 1589 - 1599
[3] Baevski A, 2020, ADV NEUR IN, V33
[4] To BERT or Not To BERT: Comparing Speech and Language-based Approaches for Alzheimer's Disease Detection
Balagopalan, Aparna
Eyre, Benjamin
Rudzicz, Frank
Novikova, Jekaterina
[J]. INTERSPEECH 2020, 2020, : 2167 - 2171
[5] THE NATURAL-HISTORY OF ALZHEIMERS-DISEASE - DESCRIPTION OF STUDY COHORT AND ACCURACY OF DIAGNOSIS
BECKER, JT
BOLLER, F
LOPEZ, OL
SAXTON, J
MCGONIGLE, KL
MOOSSY, J
HANIN, I
WOLFSON, SK
DETRE, K
HOLLAND, A
GUR, D
LATCHAW, R
BRENNER, R
[J]. ARCHIVES OF NEUROLOGY, 1994, 51 (06) : 585 - 594
[6] Classifying Autism From Crowdsourced Semistructured Speech Recordings: Machine Learning Model Comparison Study
Chi, Nathan A.
Washington, Peter
Kline, Aaron
Husic, Arman
Hou, Cathy
He, Chloe
Dunlap, Kaitlyn
Wall, Dennis P.
[J]. JMIR PEDIATRICS AND PARENTING, 2022, 5 (02):
[7] Towards Computer-Based Automated Screening of Dementia Through Spontaneous Speech
Chlasta, Karol
Wolk, Krzysztof
[J]. FRONTIERS IN PSYCHOLOGY, 2021, 11
[8] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[9] Crossing the "Cookie Theft " Corpus Chasm: Applying What BERT Learns From Outside Data to the ADReSS Challenge Dementia Detection Task
Guo, Yue
Li, Changye
Roan, Carol
Pakhomov, Serguei
Cohen, Trevor
[J]. FRONTIERS IN COMPUTER SCIENCE, 2021, 3
[10] Guo Z., 2020, P 28 INT C COMP LING, P6161, DOI [10.18653/v1/2020.coling-main.542, DOI 10.18653/V1/2020.COLING-MAIN.542]

← 1 2 3 4 →