Extracting social determinants of health from clinical note text with classification and sequence-to-sequence approaches

被引:8
作者
Romanowski, Brian [1 ]
Ben Abacha, Asma [2 ]
Fan, Yadan [1 ]
机构
[1] Nuance Commun, One Wayside Rd, Burlington, MA 01803 USA
[2] Microsoft, Redmond, WA USA
关键词
social determinants of health; information extraction; natural language processing; clinical notes; deep learning;
D O I
10.1093/jamia/ocad071
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective Social determinants of health (SDOH) are nonmedical factors that can influence health outcomes. This paper seeks to extract SDOH from clinical texts in the context of the National NLP Clinical Challenges (n2c2) 2022 Track 2 Task. Materials and Methods Annotated and unannotated data from the Medical Information Mart for Intensive Care III (MIMIC-III) corpus, the Social History Annotation Corpus, and an in-house corpus were used to develop 2 deep learning models that used classification and sequence-to-sequence (seq2seq) approaches. Results The seq2seq approach had the highest overall F1 scores in the challenge's 3 subtasks: 0.901 on the extraction subtask, 0.774 on the generalizability subtask, and 0.889 on the learning transfer subtask. Discussion Both approaches rely on SDOH event representations that were designed to be compatible with transformer-based pretrained models, with the seq2seq representation supporting an arbitrary number of overlapping and sentence-spanning events. Models with adequate performance could be produced quickly, and the remaining mismatch between representation and task requirements was then addressed in postprocessing. The classification approach used rules to generate entity relationships from its sequence of token labels, while the seq2seq approach used constrained decoding and a constraint solver to recover entity text spans from its sequence of potentially ambiguous tokens. Conclusion We proposed 2 different approaches to extract SDOH from clinical texts with high accuracy. However, accuracy suffers on text from new healthcare institutions not present in the training data, and thus generalization remains an important topic for future study.
引用
收藏
页码:1448 / 1455
页数:8
相关论文
共 42 条
  • [1] [Anonymous], 2014, P 2014 C EMP METH NA
  • [2] Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records
    Bejan, Cosmin A.
    Angiolillo, John
    Conway, Douglas
    Nash, Robertson
    Shirey-Rice, Jana K.
    Lipworth, Loren
    Cronin, Robert M.
    Pulley, Jill
    Kripalani, Sunil
    Barkin, Shari
    Johnson, Kevin B.
    Denny, Joshua C.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2018, 25 (01) : 61 - 71
  • [3] Bengio Y., 2009, P 26 ANN INT C MACHI, V382, DOI [DOI 10.1145/1553374.1553380, 10.1145/1553374.155338, 10.1145/1553374.1553380]
  • [4] Chen Elizabeth S, 2011, AMIA Annu Symp Proc, V2011, P227
  • [5] Part of speech tagging: a systematic review of deep learning and machine learning approaches
    Chiche, Alebachew
    Yitagesu, Betselot
    [J]. JOURNAL OF BIG DATA, 2022, 9 (01)
  • [6] Socio-economic status and blood pressure: an overview analysis
    Colhoun, HM
    Hemingway, H
    Poulter, NR
    [J]. JOURNAL OF HUMAN HYPERTENSION, 1998, 12 (02) : 91 - 110
  • [7] Conneau Alexis, 2020, P 58 ANN M ASS COMP, P8440, DOI [10.18653/v1/2020.acl-main.747, DOI 10.18653/V1/2020.ACL-MAIN.747]
  • [8] Moonstone: a novel natural language processing system for inferring social risk from clinical narratives
    Conway, Mike
    Keyhani, Salomeh
    Christensen, Lee
    South, Brett R.
    Vali, Marzieh
    Walter, Louise C.
    Mowery, Danielle L.
    Abdelrahman, Samir
    Chapman, Wendy W.
    [J]. JOURNAL OF BIOMEDICAL SEMANTICS, 2019, 10 (1)
  • [9] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [10] Using Information from the Electronic Health Record to Improve Measurement of Unemployment in Service Members and Veterans with mTBI and Post-Deployment Stress
    Dillahunt-Aspillaga, Christina
    Finch, Dezon
    Massengale, Jill
    Kretzmer, Tracy
    Luther, Stephen L.
    McCart, James A.
    [J]. PLOS ONE, 2014, 9 (12):