Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records

被引:65
作者
Bejan, Cosmin A. [1 ]
Angiolillo, John [2 ]
Conway, Douglas
Nash, Robertson [2 ]
Shirey-Rice, Jana K. [3 ]
Lipworth, Loren [2 ]
Cronin, Robert M. [1 ,2 ,4 ]
Pulley, Jill [3 ]
Kripalani, Sunil [2 ]
Barkin, Shari [4 ]
Johnson, Kevin B. [1 ,4 ]
Denny, Joshua C. [1 ,2 ]
机构
[1] Vanderbilt Univ, Med Ctr, Dept Biomed Informat, Nashville, TN USA
[2] Vanderbilt Univ, Med Ctr, Dept Med, Nashville, TN USA
[3] Vanderbilt Univ, Inst Clin & Translat Res, Med Ctr, Nashville, TN USA
[4] Vanderbilt Univ, Med Ctr, Dept Pediat, Nashville, TN 37232 USA
关键词
text mining; homelessness; adverse childhood experiences; social determinants of health; EHR; CARE; DISORDERS; ALGORITHM; ADULTS; DEATH;
D O I
10.1093/jamia/ocx059
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Understanding how to identify the social determinants of health from electronic health records (EHRs) could provide important insights to understand health or disease outcomes. We developed a methodology to capture 2 rare and severe social determinants of health, homelessness and adverse childhood experiences (ACEs), from a large EHR repository. We first constructed lexicons to capture homelessness and ACE phenotypic profiles. We employed word2vec and lexical associations to mine homelessness-related words. Next, using relevance feedback, we refined the 2 profiles with iterative searches over 100 million notes from the Vanderbilt EHR. Seven assessors manually reviewed the top-ranked results of 2544 patient visits relevant for homelessness and 1000 patients relevant for ACE. word2vec yielded better performance (area under the precision-recall curve [AUPRC] of 0.94) than lexical associations (AUPRC = 0.83) for extracting homelessness-related words. A comparative study of searches for the 2 phenotypes revealed a higher performance achieved for homelessness (AUPRC = 0.95) than ACE (AUPRC = 0.79). A temporal analysis of the homeless population showed that the majority experienced chronic homelessness. Most ACE patients suffered sexual (70%) and/or physical (50.6%) abuse, with the top-ranked abuser keywords being "father" (21.8%) and "mother" (15.4%). Top prevalent associated conditions for homeless patients were lack of housing (62.8%) and tobacco use disorder (61.5%), while for ACE patients it was mental disorders (36.6%-47.6%). We provide an efficient solution for mining homelessness and ACE information from EHRs, which can facilitate large clinical and genetic studies of these social determinants of health.
引用
收藏
页码:61 / 71
页数:11
相关论文
共 53 条
  • [31] PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability
    Kirby, Jacqueline C.
    Speltz, Peter
    Rasmussen, Luke V.
    Basford, Melissa
    Gottesman, Omri
    Peissig, Peggy L.
    Pacheco, Jennifer A.
    Tromp, Gerard
    Pathak, Jyotishman
    Carrell, David S.
    Ellis, Stephen B.
    Lingren, Todd
    Thompson, Will K.
    Savova, Guergana
    Haines, Jonathan
    Roden, Dan M.
    Harris, Paul A.
    Denny, Joshua C.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (06) : 1046 - 1052
  • [32] Factors associated with the health care utilization of homeless persons
    Kushel, MB
    Vittinghoff, E
    Haas, JS
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2001, 285 (02): : 200 - 206
  • [33] Health care for homeless persons
    Levy, BD
    O'Connell, JJ
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2004, 350 (23) : 2329 - 2332
  • [34] Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records
    Lin, Chen
    Karlson, Elizabeth W.
    Canhao, Helena
    Miller, Timothy A.
    Dligach, Dmitriy
    Chen, Pei Jun
    Perez, Raul Natanael Guzman
    Shen, Yuanyan
    Weinblatt, Michael E.
    Shadick, Nancy A.
    Plenge, Robert M.
    Savova, Guergana K.
    [J]. PLOS ONE, 2013, 8 (08):
  • [35] Melton Genevieve B, 2012, AMIA Annu Symp Proc, V2012, P625
  • [36] Middleton C, 2016, COMP OPEN SOURCE SEA
  • [37] Mikolov T., 2013, P 26 INT C NEURAL IN, P3111
  • [38] Desiderata for computable representations of electronic health records-driven phenotype algorithms
    Mo, Huan
    Thompson, William K.
    Rasmussen, Luke V.
    Pacheco, Jennifer A.
    Jiang, Guoqian
    Kiefer, Richard
    Zhu, Qian
    Xu, Jie
    Montague, Enid
    Carrell, David S.
    Lingren, Todd
    Mentch, Frank D.
    Ni, Yizhao
    Wehbe, Firas H.
    Peissig, Peggy L.
    Tromp, Gerard
    Larson, Eric B.
    Chute, Christopher G.
    Pathak, Jyotishman
    Denny, Joshua C.
    Speltz, Peter
    Kho, Abel N.
    Jarvik, Gail P.
    Bejan, Cosmin A.
    Williams, Marc S.
    Borthwick, Kenneth
    Kitchner, Terrie E.
    Roden, Dan M.
    Harris, Paul A.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2015, 22 (06) : 1220 - 1230
  • [39] Actual causes of death in the United States, 2000
    Mokdad, AH
    Marks, JS
    Stroup, DF
    Gerberding, JL
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2004, 291 (10): : 1238 - 1245
  • [40] National Academy of Medicine, 2014, CAPT SOC BEH DOM MEA