Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records

被引:65
作者
Bejan, Cosmin A. [1 ]
Angiolillo, John [2 ]
Conway, Douglas
Nash, Robertson [2 ]
Shirey-Rice, Jana K. [3 ]
Lipworth, Loren [2 ]
Cronin, Robert M. [1 ,2 ,4 ]
Pulley, Jill [3 ]
Kripalani, Sunil [2 ]
Barkin, Shari [4 ]
Johnson, Kevin B. [1 ,4 ]
Denny, Joshua C. [1 ,2 ]
机构
[1] Vanderbilt Univ, Med Ctr, Dept Biomed Informat, Nashville, TN USA
[2] Vanderbilt Univ, Med Ctr, Dept Med, Nashville, TN USA
[3] Vanderbilt Univ, Inst Clin & Translat Res, Med Ctr, Nashville, TN USA
[4] Vanderbilt Univ, Med Ctr, Dept Pediat, Nashville, TN 37232 USA
关键词
text mining; homelessness; adverse childhood experiences; social determinants of health; EHR; CARE; DISORDERS; ALGORITHM; ADULTS; DEATH;
D O I
10.1093/jamia/ocx059
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Understanding how to identify the social determinants of health from electronic health records (EHRs) could provide important insights to understand health or disease outcomes. We developed a methodology to capture 2 rare and severe social determinants of health, homelessness and adverse childhood experiences (ACEs), from a large EHR repository. We first constructed lexicons to capture homelessness and ACE phenotypic profiles. We employed word2vec and lexical associations to mine homelessness-related words. Next, using relevance feedback, we refined the 2 profiles with iterative searches over 100 million notes from the Vanderbilt EHR. Seven assessors manually reviewed the top-ranked results of 2544 patient visits relevant for homelessness and 1000 patients relevant for ACE. word2vec yielded better performance (area under the precision-recall curve [AUPRC] of 0.94) than lexical associations (AUPRC = 0.83) for extracting homelessness-related words. A comparative study of searches for the 2 phenotypes revealed a higher performance achieved for homelessness (AUPRC = 0.95) than ACE (AUPRC = 0.79). A temporal analysis of the homeless population showed that the majority experienced chronic homelessness. Most ACE patients suffered sexual (70%) and/or physical (50.6%) abuse, with the top-ranked abuser keywords being "father" (21.8%) and "mother" (15.4%). Top prevalent associated conditions for homeless patients were lack of housing (62.8%) and tobacco use disorder (61.5%), while for ACE patients it was mental disorders (36.6%-47.6%). We provide an efficient solution for mining homelessness and ACE information from EHRs, which can facilitate large clinical and genetic studies of these social determinants of health.
引用
收藏
页码:61 / 71
页数:11
相关论文
共 53 条
  • [1] Learning statistical models of phenotypes using noisy labeled training data
    Agarwal, Vibhu
    Podchiyska, Tanya
    Banda, Juan M.
    Goel, Veena
    Leung, Tiffany I.
    Minty, Evan P.
    Sweeney, Timothy E.
    Gyang, Elsie
    Shah, Nigam H.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (06) : 1166 - 1173
  • [2] [Anonymous], 2005, PREMATURE MORTALITY
  • [3] [Anonymous], TOB REL MORT
  • [4] [Anonymous], 2008, Introduction to information retrieval
  • [5] [Anonymous], 2012, Changes in the HUD definition of "homeless."
  • [6] [Anonymous], 2013, INT C LEARNING REPRE
  • [7] [Anonymous], 2016, ALC US YOUR HLTH
  • [8] Adverse Childhood Experiences Related to Poor Adult Health Among Lesbian, Gay, and Bisexual Individuals
    Austin, Anna
    Herrick, Harry
    Proescholdbell, Scott
    [J]. AMERICAN JOURNAL OF PUBLIC HEALTH, 2016, 106 (02) : 314 - 320
  • [9] The influence of co-occurring axis I disorders on treatment utilization and outcome in homeless patients with substance use disorders
    Austin, Julia
    McKellar, John D.
    Moos, Rudolf
    [J]. ADDICTIVE BEHAVIORS, 2011, 36 (09) : 941 - 944
  • [10] Bejan CA, 2015, AMIA JT SUMMITS TRAN, P242