Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records

被引:70
作者
Bejan, Cosmin A. [1 ]
Angiolillo, John [2 ]
Conway, Douglas
Nash, Robertson [2 ]
Shirey-Rice, Jana K. [3 ]
Lipworth, Loren [2 ]
Cronin, Robert M. [1 ,2 ,4 ]
Pulley, Jill [3 ]
Kripalani, Sunil [2 ]
Barkin, Shari [4 ]
Johnson, Kevin B. [1 ,4 ]
Denny, Joshua C. [1 ,2 ]
机构
[1] Vanderbilt Univ, Med Ctr, Dept Biomed Informat, Nashville, TN USA
[2] Vanderbilt Univ, Med Ctr, Dept Med, Nashville, TN USA
[3] Vanderbilt Univ, Inst Clin & Translat Res, Med Ctr, Nashville, TN USA
[4] Vanderbilt Univ, Med Ctr, Dept Pediat, Nashville, TN 37232 USA
关键词
text mining; homelessness; adverse childhood experiences; social determinants of health; EHR; CARE; DISORDERS; ALGORITHM; ADULTS; DEATH;
D O I
10.1093/jamia/ocx059
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Understanding how to identify the social determinants of health from electronic health records (EHRs) could provide important insights to understand health or disease outcomes. We developed a methodology to capture 2 rare and severe social determinants of health, homelessness and adverse childhood experiences (ACEs), from a large EHR repository. We first constructed lexicons to capture homelessness and ACE phenotypic profiles. We employed word2vec and lexical associations to mine homelessness-related words. Next, using relevance feedback, we refined the 2 profiles with iterative searches over 100 million notes from the Vanderbilt EHR. Seven assessors manually reviewed the top-ranked results of 2544 patient visits relevant for homelessness and 1000 patients relevant for ACE. word2vec yielded better performance (area under the precision-recall curve [AUPRC] of 0.94) than lexical associations (AUPRC = 0.83) for extracting homelessness-related words. A comparative study of searches for the 2 phenotypes revealed a higher performance achieved for homelessness (AUPRC = 0.95) than ACE (AUPRC = 0.79). A temporal analysis of the homeless population showed that the majority experienced chronic homelessness. Most ACE patients suffered sexual (70%) and/or physical (50.6%) abuse, with the top-ranked abuser keywords being "father" (21.8%) and "mother" (15.4%). Top prevalent associated conditions for homeless patients were lack of housing (62.8%) and tobacco use disorder (61.5%), while for ACE patients it was mental disorders (36.6%-47.6%). We provide an efficient solution for mining homelessness and ACE information from EHRs, which can facilitate large clinical and genetic studies of these social determinants of health.
引用
收藏
页码:61 / 71
页数:11
相关论文
共 53 条
[31]   PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability [J].
Kirby, Jacqueline C. ;
Speltz, Peter ;
Rasmussen, Luke V. ;
Basford, Melissa ;
Gottesman, Omri ;
Peissig, Peggy L. ;
Pacheco, Jennifer A. ;
Tromp, Gerard ;
Pathak, Jyotishman ;
Carrell, David S. ;
Ellis, Stephen B. ;
Lingren, Todd ;
Thompson, Will K. ;
Savova, Guergana ;
Haines, Jonathan ;
Roden, Dan M. ;
Harris, Paul A. ;
Denny, Joshua C. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (06) :1046-1052
[32]   Factors associated with the health care utilization of homeless persons [J].
Kushel, MB ;
Vittinghoff, E ;
Haas, JS .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2001, 285 (02) :200-206
[33]   Health care for homeless persons [J].
Levy, BD ;
O'Connell, JJ .
NEW ENGLAND JOURNAL OF MEDICINE, 2004, 350 (23) :2329-2332
[34]   Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records [J].
Lin, Chen ;
Karlson, Elizabeth W. ;
Canhao, Helena ;
Miller, Timothy A. ;
Dligach, Dmitriy ;
Chen, Pei Jun ;
Perez, Raul Natanael Guzman ;
Shen, Yuanyan ;
Weinblatt, Michael E. ;
Shadick, Nancy A. ;
Plenge, Robert M. ;
Savova, Guergana K. .
PLOS ONE, 2013, 8 (08)
[35]  
Melton Genevieve B, 2012, AMIA Annu Symp Proc, V2012, P625
[36]  
Middleton C, 2016, COMP OPEN SOURCE SEA
[37]  
Mikolov T., 2013, P 26 INT C NEURAL IN, P3111
[38]   Desiderata for computable representations of electronic health records-driven phenotype algorithms [J].
Mo, Huan ;
Thompson, William K. ;
Rasmussen, Luke V. ;
Pacheco, Jennifer A. ;
Jiang, Guoqian ;
Kiefer, Richard ;
Zhu, Qian ;
Xu, Jie ;
Montague, Enid ;
Carrell, David S. ;
Lingren, Todd ;
Mentch, Frank D. ;
Ni, Yizhao ;
Wehbe, Firas H. ;
Peissig, Peggy L. ;
Tromp, Gerard ;
Larson, Eric B. ;
Chute, Christopher G. ;
Pathak, Jyotishman ;
Denny, Joshua C. ;
Speltz, Peter ;
Kho, Abel N. ;
Jarvik, Gail P. ;
Bejan, Cosmin A. ;
Williams, Marc S. ;
Borthwick, Kenneth ;
Kitchner, Terrie E. ;
Roden, Dan M. ;
Harris, Paul A. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2015, 22 (06) :1220-1230
[39]   Actual causes of death in the United States, 2000 [J].
Mokdad, AH ;
Marks, JS ;
Stroup, DF ;
Gerberding, JL .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2004, 291 (10) :1238-1245
[40]  
National Academy of Medicine, 2014, CAPT SOC BEH DOM MEA