Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis

被引:13
作者
Bittar, Andre [1 ]
Velupillai, Sumithra [1 ]
Roberts, Angus [1 ]
Dutta, Rina [1 ,2 ]
机构
[1] Kings Coll London, Inst Psychiat Psychol & Neurosci, 16 De Crespigny Pk, London SE5 8AF, England
[2] South London & Maudsley NHS Fdn Trust, London, England
基金
英国科研创新办公室; 英国医学研究理事会;
关键词
psychiatry; suicide; attempted; risk assessment; electronic health records; sentiment analysis; natural language processing; corpus linguistics;
D O I
10.2196/22397
中图分类号
R-058 [];
学科分类号
摘要
Background: Suicide is a serious public health issue, accounting for 1.4% of all deaths worldwide. Current risk assessment tools are reported as performing little better than chance in predicting suicide. New methods for studying dynamic features in electronic health records (EHRs) are being increasingly explored. One avenue of research involves using sentiment analysis to examine clinicians' subjective judgments when reporting on patients. Several recent studies have used general-purpose sentiment analysis tools to automatically identify negative and positive words within EHRs to test correlations between sentiment extracted from the texts and specific medical outcomes (eg, risk of suicide or in-hospital mortality). However, little attention has been paid to analyzing the specific words identified by general-purpose sentiment lexicons when applied to EHR corpora. Objective: This study aims to quantitatively and qualitatively evaluate the coverage of six general-purpose sentiment lexicons against a corpus of EHR texts to ascertain the extent to which such lexical resources are fit for use in suicide risk assessment. Methods: The data for this study were a corpus of 198,451 EHR texts made up of two subcorpora drawn from a 1:4 case-control study comparing clinical notes written over the period leading up to a suicide attempt (cases, n=2913) with those not preceding such an attempt (controls, n=14,727). We calculated word frequency distributions within each subcorpus to identify representative keywords for both the case and control subcorpora. We quantified the relative coverage of the 6 lexicons with respect to this list of representative keywords in terms of weighted precision, recall, and F score. Results: The six lexicons achieved reasonable precision (0.53-0.68) but very low recall (0.04-0.36). Many of the most representative keywords in the suicide-related (case) subcorpus were not identified by any of the lexicons. The sentiment-bearing status of these keywords for this use case is thus doubtful. Conclusions: Our findings indicate that these 6 sentiment lexicons are not optimal for use in suicide risk assessment. We propose a set of guidelines for the creation of more suitable lexical resources for distinguishing suicide-related from non-suicide-related EHR texts.
引用
收藏
页数:14
相关论文
共 52 条
[41]  
spaCy, Industrial-strength Natural Language Processing in Python
[42]  
Stone PJ, 1966, The general inquirer: A computer approach to content analysis
[43]  
Strapparava C, 2004, P 4 INT C LANG RES E, P1083
[44]   Subjectivity and Severe Psychiatric Disorders [J].
Strauss, John .
SCHIZOPHRENIA BULLETIN, 2011, 37 (01) :8-13
[45]  
Tai Y.-J., 2013, P INT C INF INT WEB, P53, DOI [DOI 10.1145/2539150.2539190, 10.1145/2539150.2539190]
[46]   The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods [J].
Tausczik, Yla R. ;
Pennebaker, James W. .
JOURNAL OF LANGUAGE AND SOCIAL PSYCHOLOGY, 2010, 29 (01) :24-54
[47]  
Turney PD, 2002, 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P417
[48]   Risk Assessment Tools and Data-Driven Approaches for Predicting and Preventing Suicidal Behavior [J].
Velupillai, Sumithra ;
Hadlaczky, Gergo ;
Baca-Garcia, Enrique ;
Gorrell, Genevieve M. ;
Werbeloff, Nomi ;
Nguyen, Dong ;
Patel, Rashmi ;
Leightley, Daniel ;
Downs, Johnny ;
Hotopf, Matthew ;
Dutta, Rina .
FRONTIERS IN PSYCHIATRY, 2019, 10
[49]   Construct validity of six sentiment analysis methods in the text of encounter notes of patients with critical illness [J].
Weissman, Gary E. ;
Ungar, Lyle H. ;
Harhay, Michael O. ;
Courtright, Katherine R. ;
Halpern, Scott D. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 89 :114-121
[50]  
Xiaohui Tao, 2016, Advanced Data Mining and Applications. 12th International Conference, ADMA 2016. Proceedings: LNAI 10086, P807, DOI 10.1007/978-3-319-49586-6_59