Crossing the "Cookie Theft " Corpus Chasm: Applying What BERT Learns From Outside Data to the ADReSS Challenge Dementia Detection Task

被引:19
作者
Guo, Yue [1 ]
Li, Changye [2 ]
Roan, Carol [3 ]
Pakhomov, Serguei [2 ]
Cohen, Trevor [1 ]
机构
[1] Univ Washington, Dept Biomed Informat & Med Educ, Seattle, WA 98195 USA
[2] Univ Minnesota, Pharmaceut Care & Hlth Syst, Minneapolis, MN 55417 USA
[3] Univ Wisconsin, Dept Sociol, Madison, WI 53706 USA
来源
FRONTIERS IN COMPUTER SCIENCE | 2021年 / 3卷
基金
美国国家科学基金会;
关键词
dementia diagnosis; Alzheimer's disease; natural language processing; BERT; machine learning; ALZHEIMER-DISEASE; VERBAL FLUENCY; DIAGNOSIS; CARE; COHORT;
D O I
10.3389/fcomp.2021.642517
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Large amounts of labeled data are a prerequisite to training accurate and reliable machine learning models. However, in the medical domain in particular, this is also a stumbling block as accurately labeled data are hard to obtain. DementiaBank, a publicly available corpus of spontaneous speech samples from a picture description task widely used to study Alzheimer's disease (AD) patients' language characteristics and for training classification models to distinguish patients with AD from healthy controls, is relatively small-a limitation that is further exacerbated when restricting to the balanced subset used in the Alzheimer's Dementia Recognition through Spontaneous Speech (ADReSS) challenge. We build on previous work showing that the performance of traditional machine learning models on DementiaBank can be improved by the addition of normative data from other sources, evaluating the utility of such extrinsic data to further improve the performance of state-of-the-art deep learning based methods on the ADReSS challenge dementia detection task. To this end, we developed a new corpus of professionally transcribed recordings from the Wisconsin Longitudinal Study (WLS), resulting in 1366 additional Cookie Theft Task transcripts, increasing the available training data by an order of magnitude. Using these data in conjunction with DementiaBank is challenging because the WLS metadata corresponding to these transcripts do not contain dementia diagnoses. However, cognitive status of WLS participants can be inferred from results of several cognitive tests including semantic verbal fluency available in WLS data. In this work, we evaluate the utility of using the WLS 'controls' (participants without indications of abnormal cognitive status), and these data in conjunction with inferred 'cases' (participants with such indications) for training deep learning models to discriminate between language produced by patients with dementia and healthy controls. We find that incorporating WLS data during training a BERT model on ADReSS data improves its performance on the ADReSS dementia detection task, supporting the hypothesis that incorporating WLS data adds value in this context. We also demonstrate that weighted cost functions and additional prediction targets may be effective ways to address issues arising from class imbalance and confounding effects due to data provenance.
引用
收藏
页数:10
相关论文
共 40 条
  • [1] Cognitive Decline in a Colombian Kindred With Autosomal Dominant Alzheimer Disease A Retrospective Cohort Study
    Aguirre-Acevedo, Daniel C.
    Lopera, Francisco
    Henao, Eliana
    Tirado, Victoria
    Munoz, Claudia
    Giraldo, Margarita
    Bangdiwala, Shrikant I.
    Reiman, Eric M.
    Tariot, Pierre N.
    Langbaum, Jessica B.
    Quiroz, Yakeel T.
    Jaimes, Fabian
    [J]. JAMA NEUROLOGY, 2016, 73 (04) : 431 - 438
  • [2] 2018 Alzheimer's disease facts and figures
    不详
    [J]. ALZHEIMERS & DEMENTIA, 2018, 14 (03) : 367 - 425
  • [3] To BERT or Not To BERT: Comparing Speech and Language-based Approaches for Alzheimer's Disease Detection
    Balagopalan, Aparna
    Eyre, Benjamin
    Rudzicz, Frank
    Novikova, Jekaterina
    [J]. INTERSPEECH 2020, 2020, : 2167 - 2171
  • [4] THE NATURAL-HISTORY OF ALZHEIMERS-DISEASE - DESCRIPTION OF STUDY COHORT AND ACCURACY OF DIAGNOSIS
    BECKER, JT
    BOLLER, F
    LOPEZ, OL
    SAXTON, J
    MCGONIGLE, KL
    MOOSSY, J
    HANIN, I
    WOLFSON, SK
    DETRE, K
    HOLLAND, A
    GUR, D
    LATCHAW, R
    BRENNER, R
    [J]. ARCHIVES OF NEUROLOGY, 1994, 51 (06) : 585 - 594
  • [5] Blendon R., 2011, ALZHEIMERS DEMENT, V7, P50, DOI [10.1016/j.jalz.2011.09.209, DOI 10.1016/J.JALZ.2011.09.209]
  • [6] Diagnosing dementia: Perspectives of primary care physicians
    Boise, L
    Camicioli, R
    Morgan, DL
    Rose, JH
    Congleton, L
    [J]. GERONTOLOGIST, 1999, 39 (04) : 457 - 464
  • [7] Inequalities in dementia care across Europe: key findings of the Facing Dementia Survey
    Bond, J
    Stave, C
    Sganga, A
    O'Connell, B
    Stanley, RL
    [J]. INTERNATIONAL JOURNAL OF CLINICAL PRACTICE, 2005, 59 : 8 - 14
  • [8] Missed and Delayed Diagnosis of Dementia in Primary Care Prevalence and Contributing Factors
    Bradford, Andrea
    Kunik, Mark E.
    Schulz, Paul
    Williams, Susan P.
    Singh, Hardeep
    [J]. ALZHEIMER DISEASE & ASSOCIATED DISORDERS, 2009, 23 (04) : 306 - 314
  • [9] Diagnostic utility of abbreviated fluency measures in Alzheimer disease and vascular dementia
    Canning, SJD
    Leach, L
    Stuss, D
    Ngo, L
    Black, SE
    [J]. NEUROLOGY, 2004, 62 (04) : 556 - 562
  • [10] Cohen T, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P1946