Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts

被引:44
|
作者
Tsui, Fuchiang R. [1 ,2 ,3 ,4 ]
Shi, Lingyun [1 ,3 ]
Ruiz, Victor [1 ,3 ]
Ryan, Neal D. [5 ]
Biernesser, Candice [5 ]
Iyengar, Satish [6 ]
Walsh, Colin G. [7 ]
Brent, David A. [5 ]
机构
[1] Childrens Hosp Philadelphia, Tsui Lab, Philadelphia, PA 19104 USA
[2] Childrens Hosp Philadelphia, Dept Anesthesiol & Crit Care Med, Philadelphia, PA 19104 USA
[3] Childrens Hosp Philadelphia, Dept Biomed & Hlth Informat, Philadelphia, PA 19104 USA
[4] Univ Penn, Perelman Sch Med, Dept Anesthesiol & Crit Care, Philadelphia, PA 19104 USA
[5] Univ Pittsburgh, Sch Med, Dept Psychiat, Pittsburgh, PA USA
[6] Univ Pittsburgh, Sch Arts & Sci, Dept Stat, Pittsburgh, PA USA
[7] Vanderbilt Univ, Sch Med, Dept Biomed Informat, Nashville, TN 37212 USA
关键词
suicide attempt; machine learning; natural language processing; electronic health records; PSYCHOLOGICAL AUTOPSY; CARE CONTACTS; ODDS RATIO; CLASSIFICATION; EXTRACTION; DECEDENTS; BEHAVIOR; RISK;
D O I
10.1093/jamiaopen/ooab011
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objective: Limited research exists in predicting first-time suicide attempts that account for two-thirds of suicide decedents. We aimed to predict first-time suicide attempts using a large data-driven approach that applies natural language processing (NLP) and machine learning (ML) to unstructured (narrative) clinical notes and structured electronic health record (EHR) data. Methods: This case-control study included patients aged 10-75 years who were seen between 2007 and 2016 from emergency departments and inpatient units. Cases were first-time suicide attempts from coded diagnosis; controls were randomly selected without suicide attempts regardless of demographics, following a ratio of nine controls per case. Four data-driven ML models were evaluated using 2-year historical EHR data prior to suicide attempt or control index visits, with prediction windows from 7 to 730 days. Patients without any historical notes were excluded. Model evaluation on accuracy and robustness was performed on a blind dataset (30% cohort). Results: The study cohort included 45 238 patients (5099 cases, 40 139 controls) comprising 54 651 variables from 5.7 million structured records and 798 665 notes. Using both unstructured and structured data resulted in significantly greater accuracy compared to structured data alone (area-under-the-curve [AUC]: 0.932 vs. 0.901 P<.001). The best-predicting model utilized 1726 variables with AUC = 0.932 (95% CI, 0.922-0.941). The model was robust across multiple prediction windows and subgroups by demographics, points of historical most recent clinical contact, and depression diagnosis history. Conclusions: Our large data-driven approach using both structured and unstructured EHR data demonstrated accurate and robust first-time suicide attempt prediction, and has the potential to be deployed across various populations and clinical settings.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Prediction and evaluation of combination pharmacotherapy using natural language processing, machine learning and patient electronic health records
    Ding, Pingjian
    Pan, Yiheng
    Wang, Quanqiu
    Xu, Rong
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 133
  • [2] Machine learning for suicide risk prediction in children and adolescents with electronic health records
    Chang Su
    Robert Aseltine
    Riddhi Doshi
    Kun Chen
    Steven C. Rogers
    Fei Wang
    Translational Psychiatry, 10
  • [3] Machine learning for suicide risk prediction in children and adolescents with electronic health records
    Su, Chang
    Aseltine, Robert
    Doshi, Riddhi
    Chen, Kun
    Rogers, Steven C.
    Wang, Fei
    TRANSLATIONAL PSYCHIATRY, 2020, 10 (01)
  • [4] Using Natural Language Processing and Machine Learning to Identify Incident Stroke From Electronic Health Records
    Zhao, Yiqing
    Fu, Sunyang
    Bielinski, Suzette J.
    Decker, Paul
    Chamberlain, Alanna M.
    Roger, Veronique L.
    Liu, Hongfang
    Larson, Nicolas B.
    CIRCULATION, 2020, 141
  • [5] Natural Language Processing and Machine Learning to Identify People Who Inject Drugs in Electronic Health Records
    Goodman-Meza, David
    Tang, Amber
    Aryanfar, Babak
    Vazquez, Sergio
    Gordon, Adam J.
    Goto, Michihiko
    Goetz, Matthew Bidwell
    Shoptaw, Steven
    Bui, Alex A. T.
    OPEN FORUM INFECTIOUS DISEASES, 2022, 9 (09):
  • [6] Machine Learning and Natural Language Processing to Improve Classification of Atrial Septal Defects in Electronic Health Records
    Guo, Yuting
    Shi, Haoming
    Book, Wendy M.
    Ivey, Lindsey Carrie
    Rodriguez, Fred H.
    Sameni, Reza
    Raskind-Hood, Cheryl
    Robichaux, Chad
    Downing, Karrie F.
    Sarker, Abeed
    BIRTH DEFECTS RESEARCH, 2025, 117 (03):
  • [7] Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
    Zhao, Yiqing
    Fu, Sunyang
    Bielinski, Suzette J.
    Decker, Paul A.
    Chamberlain, Alanna M.
    Roger, Veronique L.
    Liu, Hongfang
    Larson, Nicholas B.
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (03)
  • [8] Predictive model for a second hip fracture occurrence using natural language processing and machine learning on electronic health records
    Larrainzar-Garijo, Ricardo
    Fernandez-Tormos, Esther
    Collado-Escudero, Carlos Alberto
    Ibanez, Maria Alcantud
    Francisco, Fernando Onorbe-San
    Marin-Corral, Judith
    Casadevall, David
    Donaire-Gonzalez, David
    Martinez-Sanchez, Luisa
    Cabal-Hierro, Lucia
    Benavent, Diego
    Branas, Fatima
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [9] Predictive model for a second hip fracture occurrence using natural language processing and machine learning on electronic health records
    Ricardo Larrainzar-Garijo
    Esther Fernández-Tormos
    Carlos Alberto Collado-Escudero
    María Alcantud Ibáñez
    Fernando Oñorbe-San Francisco
    Judith Marin-Corral
    David Casadevall
    David Donaire-Gonzalez
    Luisa Martínez-Sanchez
    Lucia Cabal-Hierro
    Diego Benavent
    Fátima Brañas
    Scientific Reports, 14
  • [10] Using Natural Language Processing on Electronic Health Records to Enhance Detection and Prediction of Psychosis Risk
    Irving, Jessica
    Patel, Rashmi
    Oliver, Dominic
    Colling, Craig
    Pritchard, Megan
    Broadbent, Matthew
    Baldwin, Helen
    Stahl, Daniel
    Stewart, Robert
    Fusar-Poli, Paolo
    SCHIZOPHRENIA BULLETIN, 2021, 47 (02) : 405 - 414