Using natural language processing to improve suicide classification requires consideration of race

被引:18
作者
Rahman, Nusrat [1 ,2 ]
Mozer, Reagan [3 ]
McHugh, R. Kathryn [4 ,5 ]
Rockett, Ian R. H. [6 ,7 ]
Chow, Clifton M. [3 ,8 ]
Vaughan, Gregory [3 ]
机构
[1] Bentley Univ, Dept Nat & Appl Sci, Waltham, MA 02452 USA
[2] Bentley Univ, Hlth Thought Leadership Network, Waltham, MA 02452 USA
[3] Bentley Univ, Dept Math Sci, Waltham, MA 02452 USA
[4] McLean Hosp, Div Alcohol Drugs & Addict, 115 Mill St, Belmont, MA 02178 USA
[5] Harvard Med Sch, Dept Psychiat, Boston, MA 02115 USA
[6] West Virginia Univ, Dept Epidemiol & Biostat, Morgantown, WV 26506 USA
[7] Univ Rochester, Med Ctr, Dept Psychiat, Rochester, NY 14642 USA
[8] Bentley Univ, Acad Technol Ctr, Waltham, MA 02452 USA
关键词
National Violent Death Reporting System; natural language processing; statistical text analysis; DEATH; MISCLASSIFICATION; INTOXICATION; PREVENTION; STATISTICS;
D O I
10.1111/sltb.12862
中图分类号
R749 [精神病学];
学科分类号
100205 ;
摘要
Objectives To improve the accuracy of classification of deaths of undetermined intent and to examine racial differences in misclassification. Methods We used natural language processing and statistical text analysis on restricted-access case narratives of suicides, homicides, and undetermined deaths in 37 states collected from the National Violent Death Reporting System (NVDRS) (2017). We fit separate race-specific classification models to predict suicide among undetermined cases using data from known homicide cases (true negatives) and known suicide cases (true positives). Results A classifier trained on an all-race dataset predicts less than half of these cases as suicide. Importantly, our analysis yields an estimated suicide rate for the Black population comparable with the typical detection rate for the White population, indicating that misclassification excess is endemic for Black suicide. This problem may be mitigated by using race-specific data. Our findings, based on the statistical text analysis, also reveal systematic differences in the phrases identified as most predictive of suicide. Conclusions This study highlights the need to understand the reasons underlying suicide rate differences and for further testing of strategies to reduce misclassification, particularly among people of color.
引用
收藏
页码:782 / 791
页数:10
相关论文
共 37 条
[1]   Racial/Ethnic Differences in Preceding Circumstances of Suicide and Potential Suicide Misclassification Among US Adolescents [J].
Ali, Bina ;
Rockett, Ian R. H. ;
Miller, Ted R. ;
Leonardo, Jennifer B. .
JOURNAL OF RACIAL AND ETHNIC HEALTH DISPARITIES, 2022, 9 (01) :296-304
[2]  
Benoit K., 2018, Journal of Open Source Software, V3, DOI DOI 10.21105/JOSS.00774
[3]   Misclassification of suicide deaths: examining the psychiatric history of overdose decedents [J].
Bohnert, Amy S. B. ;
McCarthy, John F. ;
Ignacio, Rosalinda V. ;
Ilgen, Mark A. ;
Eisenberg, Anna ;
Blow, Frederic C. .
INJURY PREVENTION, 2013, 19 (05) :326-330
[4]  
Centers for Disease Control and Prevention, 2021, analysis of the National Violent Death Reporting
[5]  
Centers for Disease Control and Prevention National Center for Injury Prevention and Control, 2020, NAT VIOL DEATH REP S
[6]   Natural Language Processing of Social Media as Screening for Suicide Risk [J].
Coppersmith, Glen ;
Leary, Ryan ;
Crutchley, Patrick ;
Fine, Alex .
BIOMEDICAL INFORMATICS INSIGHTS, 2018, 10
[7]   COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH [J].
DELONG, ER ;
DELONG, DM ;
CLARKEPEARSON, DI .
BIOMETRICS, 1988, 44 (03) :837-845
[8]  
Edwards HT, 2009, STRENGTHENING FORENSIC SCIENCE IN THE UNITED STATES: A PATH FORWARD, P1
[9]   Rates and Correlates of Undetermined Deaths among African Americans: Results from the National Violent Death Reporting System [J].
Huguet, Nathalie ;
Kaplan, Mark S. ;
McFarland, Bentson H. .
SUICIDE AND LIFE-THREATENING BEHAVIOR, 2012, 42 (02) :185-196
[10]   CONCISE COMPARATIVE SUMMARIES (CCS) OF LARGE TEXT CORPORA WITH A HUMAN EXPERIMENT [J].
Jia, Jinzhu ;
Miratrix, Luke ;
Yu, Bin ;
Gawalt, Brian ;
El Ghaoui, Laurent ;
Barnesmoore, Luke ;
Clavier, Sophie .
ANNALS OF APPLIED STATISTICS, 2014, 8 (01) :499-529