Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system

被引:5
作者
Gray, Geoffrey M. [1 ]
Zirikly, Ayah [2 ]
Ahumada, Luis M. [1 ]
Rouhizadeh, Masoud [3 ]
Richards, Thomas [4 ]
Kitchen, Christopher [4 ]
Foroughmand, Iman [4 ]
Hatef, Elham [4 ,5 ]
机构
[1] Johns Hopkins All Childrens Hosp, Ctr Pediat Data Sci & Analyt Methodol, St Petersburg, FL USA
[2] Johns Hopkins Univ, Whiting Sch Engn, Dept Comp Sci, Baltimore, MD USA
[3] Univ Florida, Coll Pharm, Dept Pharmaceut Outcomes & Policy, Gainesville, FL USA
[4] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Hlth Policy & Management, Ctr Populat Hlth Informat Technol, 624 N Broadway,Room 502, Baltimore, MD 21205 USA
[5] Johns Hopkins Sch Med, Dept Med, Div Gen Internal Med, Baltimore, MD USA
关键词
social needs; residential instability; food insecurity; transportation; natural language processing; free text; DETERMINANTS; HOSPITALIZATION;
D O I
10.1093/jamiaopen/ooad085
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objectives To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs).Materials and Methods We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and F1 score.Results The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and F1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric.Discussion The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system.Conclusion The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system. We developed and tested an algorithm for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the free-text notes in electronic health records (EHRs). Thus, we included patients aged 18 years or older who received care at the Johns Hopkins Health System between July 2016 and June 2021 and had at least 1 note in their EHR during the study period. We developed keywords and phrases, which described the social needs, and developed natural language processing (NLP) algorithms that used those keywords to identify different social needs in free-text EHR. We assessed the performance of these algorithms and compared what they identified in the notes with what a human identified through a direct review of the notes. The algorithm for identifying residential instability had the best overall performance, the algorithm for identifying food insecurity performed relatively well but the transportation issues algorithm was the lowest overall performing metric. The NLP algorithms developed in this study would provide the opportunity for implementation in different healthcare systems and could be adapted and potentially operationalized in the routine data processes of the healthcare systems.
引用
收藏
页数:9
相关论文
共 33 条
[1]   Accountable Health Communities - Addressing Social Needs through Medicare and Medicaid [J].
Alley, Dawn E. ;
Asomugha, Chisara N. ;
Conway, Patrick H. ;
Sanghavi, Darshak M. .
NEW ENGLAND JOURNAL OF MEDICINE, 2016, 374 (01) :8-11
[2]  
Alsentzer E, 2019, Arxiv, DOI arXiv:1904.03323
[3]   Comparing clinician descriptions of frailty and geriatric syndromes using electronic health records: a retrospective cohort study [J].
Anzaldi, Laura J. ;
Davison, Ashwini ;
Boyd, Cynthia M. ;
Leff, Bruce ;
Kharrazi, Hadi .
BMC GERIATRICS, 2017, 17
[4]  
Arons A., 2018, COMPENDIUM MEDICAL T
[5]   Food Insecurity and Health Care Expenditures in the United States, 2011-2013 [J].
Berkowitz, Seth A. ;
Basu, Sanjay ;
Meigs, James B. ;
Seligman, Hilary K. .
HEALTH SERVICES RESEARCH, 2018, 53 (03) :1600-1620
[6]  
Bureau TUSC, American Housing Survey (AHS)
[7]  
Bureau TUSC, American Community Survey (ACS)
[8]   When There Is Value in Asking: An Argument for Social Risk Screening in Clinical Practice [J].
Byhoff, Elena ;
Gottlieb, Laura M. .
ANNALS OF INTERNAL MEDICINE, 2022, 175 (08) :1181-+
[9]  
Centers. NAoCH, The protocol for responding to and assessing patients' assets, risks, and experiences (PRAPARE)
[10]   Moonstone: a novel natural language processing system for inferring social risk from clinical narratives [J].
Conway, Mike ;
Keyhani, Salomeh ;
Christensen, Lee ;
South, Brett R. ;
Vali, Marzieh ;
Walter, Louise C. ;
Mowery, Danielle L. ;
Abdelrahman, Samir ;
Chapman, Wendy W. .
JOURNAL OF BIOMEDICAL SEMANTICS, 2019, 10 (1)