Extracting social determinants of health from electronic health records using natural language processing: a systematic review

被引:103
作者
Patra, Braja G. [1 ]
Sharma, Mohit M. [1 ]
Vekaria, Veer [1 ]
Adekkanattu, Prakash [2 ]
Patterson, Olga, V [3 ,4 ]
Glicksberg, Benjamin [5 ]
Lepow, Lauren A. [5 ]
Ryu, Euijung [6 ]
Biernacka, Joanna M. [6 ]
Furmanchuk, Al'ona [7 ]
George, Thomas J. [8 ]
Hogan, William [9 ]
Wu, Yonghui [8 ]
Yang, Xi [8 ]
Bian, Jiang [8 ]
Weissman, Myrna [10 ]
Wickramaratne, Priya [10 ]
Mann, J. John [10 ]
Olfson, Mark [10 ]
Campion, Thomas R., Jr. [1 ,2 ]
Weiner, Mark [1 ]
Pathak, Jyotishman [1 ]
机构
[1] Weill Cornell Med, Dept Populat Hlth Sci, 425 E 61st St,Suite 301, New York, NY 10065 USA
[2] Weill Cornell Med, Informat Technol & Serv, New York, NY 10065 USA
[3] Univ Utah, Dept Internal Med, Div Epidemiol, Salt Lake City, UT 84112 USA
[4] US Dept Vet Affairs, Salt Lake City, UT USA
[5] Icahn Sch Med Mt Sinai, New York, NY 10029 USA
[6] Mayo Clin, Dept Quantitat Hlth Sci, Rochester, MN USA
[7] Northwestern Univ, Chicago, IL 60611 USA
[8] Univ Florida, Dept Hlth Outcomes & Biomed Informat, Gainesville, FL USA
[9] Univ Florida, Coll Med, Dept Med, Div Hematol & Oncol, Gainesville, FL USA
[10] Columbia Univ, Vagelos Coll Phys & Surg, New York, NY USA
关键词
social determinants of health; population health outcomes; electronic health records; natural language processing; information extraction; machine learning; PROBLEM OPIOID USE; BINGE-EATING DISORDER; AUTOMATED IDENTIFICATION; UNSTRUCTURED DATA; CARE; VALIDATION; ABUSE; RISK;
D O I
10.1093/jamia/ocab170
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Social determinants of health (SDoH) are nonclinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision-making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such information is typically locked in unstructured clinical notes. Natural language processing (NLP) is the key technology to extract SDoH information from clinical text and expand its utility in patient care and research. This article presents a systematic review of the state-of-the-art NLP approaches and tools that focus on identifying and extracting SDoH data from unstructured clinical text in EHRs. Materials and Methods: A broad literature search was conducted in February 2021 using 3 scholarly databases (ACL Anthology, PubMed, and Scopus) following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 6402 publications were initially identified, and after applying the study inclusion criteria, 82 publications were selected for the final review. Results: Smoking status (n=27), substance use (n=21), homelessness (n=20), and alcohol use (n=15) are the most frequently studied SDoH categories. Homelessness (n=7) and other less-studied SDoH (eg, education, financial problems, social isolation and support, family problems) are mostly identified using rule-based approaches. In contrast, machine learning approaches are popular for identifying smoking status (n=13), substance use (n=9), and alcohol use (n=9). Conclusion: NLP offers significant potential to extract SDoH data from narrative clinical notes, which in turn can aid in the development of screening tools, risk prediction models, and clinical decision support systems.
引用
收藏
页码:2716 / 2727
页数:12
相关论文
共 88 条
  • [1] Subtypes in patients with opioid misuse: A prognostic enrichment strategy using electronic health record data in hospitalized patients
    Afshar, Majid
    Joyce, Cara
    Dligach, Dmitriy
    Sharma, Brihat
    Kania, Robert
    Xie, Meng
    Swope, Kristin
    Salisbury-Afshar, Elizabeth
    Karnik, Niranjan S.
    [J]. PLOS ONE, 2019, 14 (07):
  • [2] Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation
    Afshar, Majid
    Phillips, Andrew
    Karnik, Niranjan
    Mueller, Jeanne
    To, Daniel
    Gonzalez, Richard
    Price, Ron
    Cooper, Richard
    Joyce, Cara
    Dligach, Dmitriy
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (03) : 254 - 261
  • [3] Identifying child abuse through text mining and machine learning
    Amrit, Chintan
    Paauw, Tim
    Aly, Robin
    Lavric, Miha
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 88 : 402 - 418
  • [4] Machine learning for phenotyping opioid overdose events
    Badger, Jonathan
    LaRose, Eric
    Mayer, John
    Bashiri, Fereshteh
    Page, David
    Peissig, Peggy
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 94
  • [5] Baldwin Karen Brandt, 2008, J Healthc Qual, V30, P24
  • [6] Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records
    Bejan, Cosmin A.
    Angiolillo, John
    Conway, Douglas
    Nash, Robertson
    Shirey-Rice, Jana K.
    Lipworth, Loren
    Cronin, Robert M.
    Pulley, Jill
    Kripalani, Sunil
    Barkin, Shari
    Johnson, Kevin B.
    Denny, Joshua C.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2018, 25 (01) : 61 - 71
  • [7] Healthcare Costs and Resource Utilization of Patients with Binge-Eating Disorder and Eating Disorder Not Otherwise Specified in the Department of Veterans Affairs
    Bellows, Brandon K.
    DuVall, Scott L.
    Kamauu, Aaron W. C.
    Supina, Dylan
    Babcock, Thomas
    LaFleur, Joanne
    [J]. INTERNATIONAL JOURNAL OF EATING DISORDERS, 2015, 48 (08) : 1082 - 1091
  • [8] Automated identification of patients with a diagnosis of binge eating disorder from narrative electronic health records
    Bellows, Brandon K.
    LaFleur, Joanne
    Kamauu, Aaron W. C.
    Ginter, Thomas
    Forbush, Tyler B.
    Agbor, Stephen
    Supina, Dylan
    Hodgkins, Paul
    DuVall, Scott L.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (E1) : E163 - E168
  • [9] Discovering New Social Determinants of Health Concepts from Unstructured Data: Framework and Evaluation
    Bettencourt-Silva, Joao H.
    Mulligan, Natalia
    Sbodio, Marco
    Segrave-Daly, John
    Williams, Richard
    Lopez, Vanessa
    Alzate, Carlos
    [J]. DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 173 - 177
  • [10] The association between neighbourhood characteristics and physical victimisation in men and women with mental disorder
    Bhavsar, Vishal
    Sanyal, Jyoti
    Patel, Rashmi
    Shetty, Hitesh
    Velupillai, Sumithra
    Stewart, Robert
    Broadbent, Matthew
    MacCabe, James H.
    Das-Munshi, Jayati
    Howard, Louise M.
    [J]. BJPSYCH OPEN, 2020, 6 (04):