Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records

被引:70
作者
Bejan, Cosmin A. [1 ]
Angiolillo, John [2 ]
Conway, Douglas
Nash, Robertson [2 ]
Shirey-Rice, Jana K. [3 ]
Lipworth, Loren [2 ]
Cronin, Robert M. [1 ,2 ,4 ]
Pulley, Jill [3 ]
Kripalani, Sunil [2 ]
Barkin, Shari [4 ]
Johnson, Kevin B. [1 ,4 ]
Denny, Joshua C. [1 ,2 ]
机构
[1] Vanderbilt Univ, Med Ctr, Dept Biomed Informat, Nashville, TN USA
[2] Vanderbilt Univ, Med Ctr, Dept Med, Nashville, TN USA
[3] Vanderbilt Univ, Inst Clin & Translat Res, Med Ctr, Nashville, TN USA
[4] Vanderbilt Univ, Med Ctr, Dept Pediat, Nashville, TN 37232 USA
关键词
text mining; homelessness; adverse childhood experiences; social determinants of health; EHR; CARE; DISORDERS; ALGORITHM; ADULTS; DEATH;
D O I
10.1093/jamia/ocx059
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Understanding how to identify the social determinants of health from electronic health records (EHRs) could provide important insights to understand health or disease outcomes. We developed a methodology to capture 2 rare and severe social determinants of health, homelessness and adverse childhood experiences (ACEs), from a large EHR repository. We first constructed lexicons to capture homelessness and ACE phenotypic profiles. We employed word2vec and lexical associations to mine homelessness-related words. Next, using relevance feedback, we refined the 2 profiles with iterative searches over 100 million notes from the Vanderbilt EHR. Seven assessors manually reviewed the top-ranked results of 2544 patient visits relevant for homelessness and 1000 patients relevant for ACE. word2vec yielded better performance (area under the precision-recall curve [AUPRC] of 0.94) than lexical associations (AUPRC = 0.83) for extracting homelessness-related words. A comparative study of searches for the 2 phenotypes revealed a higher performance achieved for homelessness (AUPRC = 0.95) than ACE (AUPRC = 0.79). A temporal analysis of the homeless population showed that the majority experienced chronic homelessness. Most ACE patients suffered sexual (70%) and/or physical (50.6%) abuse, with the top-ranked abuser keywords being "father" (21.8%) and "mother" (15.4%). Top prevalent associated conditions for homeless patients were lack of housing (62.8%) and tobacco use disorder (61.5%), while for ACE patients it was mental disorders (36.6%-47.6%). We provide an efficient solution for mining homelessness and ACE information from EHRs, which can facilitate large clinical and genetic studies of these social determinants of health.
引用
收藏
页码:61 / 71
页数:11
相关论文
共 53 条
[1]   Learning statistical models of phenotypes using noisy labeled training data [J].
Agarwal, Vibhu ;
Podchiyska, Tanya ;
Banda, Juan M. ;
Goel, Veena ;
Leung, Tiffany I. ;
Minty, Evan P. ;
Sweeney, Timothy E. ;
Gyang, Elsie ;
Shah, Nigam H. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (06) :1166-1173
[2]  
[Anonymous], 2005, PREMATURE MORTALITY
[3]  
[Anonymous], TOB REL MORT
[4]  
[Anonymous], 2008, Introduction to information retrieval
[5]  
[Anonymous], 2012, Changes in the HUD definition of "homeless."
[6]  
[Anonymous], 2013, INT C LEARNING REPRE
[7]  
[Anonymous], 2016, ALC US YOUR HLTH
[8]   Adverse Childhood Experiences Related to Poor Adult Health Among Lesbian, Gay, and Bisexual Individuals [J].
Austin, Anna ;
Herrick, Harry ;
Proescholdbell, Scott .
AMERICAN JOURNAL OF PUBLIC HEALTH, 2016, 106 (02) :314-320
[9]   The influence of co-occurring axis I disorders on treatment utilization and outcome in homeless patients with substance use disorders [J].
Austin, Julia ;
McKellar, John D. ;
Moos, Rudolf .
ADDICTIVE BEHAVIORS, 2011, 36 (09) :941-944
[10]  
Bejan CA, 2015, AMIA JT SUMMITS TRAN, P242