Enriching Complex Networks with Word Embeddings for Detecting Mild Cognitive Impairment from Speech Transcripts

被引:19
作者
dos Santos, Leandro B. [1 ]
Correa, Edilson A., Jr. [1 ]
Oliveira, Osvaldo N., Jr. [2 ]
Amancio, Diego R. [1 ]
Mansur, Leticia L. [3 ]
Aluisio, Sandra M. [1 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, SP, Brazil
[2] Univ Sao Paulo, Sao Carlos Inst Phys, Sao Carlos, SP, Brazil
[3] Univ Sao Paulo, Dept Physiotherapy Speech Pathol & Occupat Therap, Sao Paulo, SP, Brazil
来源
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1 | 2017年
基金
巴西圣保罗研究基金会;
关键词
ALZHEIMERS-DISEASE; CLASSIFICATION; LANGUAGE; DEMENTIA; TEXT;
D O I
10.18653/v1/P17-1118
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Mild Cognitive Impairment (MCI) is a mental disorder difficult to diagnose. Linguistic features, mainly from parsers, have been used to detect MCI, but this is not suitable for large-scale assessments. MCI disfluencies produce non-grammatical speech that requires manual or high precision automatic correction of transcripts. In this paper, we modeled transcripts into complex networks and enriched them with word embedding (CNE) to better represent short texts produced in neuropsychological assessments. The network measurements were applied with well-known classifiers to automatically identify MCI in transcripts, in a binary classification task. A comparison was made with the performance of traditional approaches using Bag of Words (BoW) and linguistic features for three datasets: DementiaBank in English, and Cinderella and Arizona-Battery in Portuguese. Overall, CNE provided higher accuracy than using only complex networks, while Support Vector Machine was superior to other classifiers. CNE provided the highest accuracies for DementiaBank and Cinderella, but BoW was more efficient for the Arizona-Battery dataset probably owing to its short narratives. The approach using linguistic features yielded higher accuracy if the transcriptions of the Cinderella dataset were manually revised. Taken together, the results indicate that complex networks enriched with embedding is promising for detecting MCI in large-scale assessments.
引用
收藏
页码:1284 / 1296
页数:13
相关论文
共 55 条
[1]   Evaluating Progression of Alzheimer's Disease by Regression and Classification Methods in a Narrative Language Test in Portuguese [J].
Aluisio, Sandra ;
Cunha, Andre ;
Scarton, Carolina .
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE (PROPOR 2016), 2016, 9727 :109-114
[2]   Probing the Topological Properties of Complex Networks Modeling Short Written Texts [J].
Amancio, Diego R. .
PLOS ONE, 2015, 10 (02)
[3]   Authorship recognition via fluctuation analysis of network topology and word intermittency [J].
Amancio, Diego R. .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2015,
[4]   Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript [J].
Amancio, Diego R. ;
Altmann, Eduardo G. ;
Rybski, Diego ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da F. .
PLOS ONE, 2013, 8 (07)
[5]   Unveiling the relationship between complex networks metrics and word senses [J].
Amancio, Diego R. ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da F. .
EPL, 2012, 98 (01)
[6]   Extractive summarization using complex networks and syntactic dependency [J].
Amancio, Diego R. ;
Nunes, Maria G. V. ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da F. .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2012, 391 (04) :1855-1864
[7]   A Complex Network Approach to Stylometry [J].
Amancio, Diego Raphael .
PLOS ONE, 2015, 10 (08)
[8]   Identification of literary movements using complex networks to represent texts [J].
Amancio, Diego Raphael ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da Fontoura .
NEW JOURNAL OF PHYSICS, 2012, 14
[9]  
[Anonymous], 2001, Boston naming test
[10]  
[Anonymous], 2014, P WORKSH COMP LING C, DOI 10.3115/v1/W14-3210