Predicting early psychiatric readmission with natural language processing of narrative discharge summaries

被引:119
作者
Rumshisky, A. [1 ,2 ]
Ghassemi, M. [1 ]
Naumann, T. [1 ]
Szolovits, P. [1 ]
Castro, V. M. [3 ,4 ,5 ,6 ]
Mccoy, T. H. [3 ,4 ,5 ]
Perlis, R. H. [3 ,4 ,5 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Univ Massachusetts Lowell, Dept Comp Sci, Lowell, MA USA
[3] Massachusetts Gen Hosp, Ctr Expt Drugs & Diagnost, Boston, MA 02114 USA
[4] Massachusetts Gen Hosp, Dept Psychiat, Simches Res Bldg,185 Cambridge St,6th Floor, Boston, MA 02114 USA
[5] Massachusetts Gen Hosp, Ctr Human Genet Res, Boston, MA 02114 USA
[6] Partners HealthCare Syst, Partners Res Informat Syst & Comp, Boston, MA USA
关键词
ELECTRONIC MEDICAL-RECORDS; IDENTIFICATION; DEPRESSION;
D O I
10.1038/tp.2015.182
中图分类号
R749 [精神病学];
学科分类号
100205 ;
摘要
The ability to predict psychiatric readmission would facilitate the development of interventions to reduce this risk, a major driver of psychiatric health-care costs. The symptoms or characteristics of illness course necessary to develop reliable predictors are not available in coded billing data, but may be present in narrative electronic health record (EHR) discharge summaries. We identified a cohort of individuals admitted to a psychiatric inpatient unit between 1994 and 2012 with a principal diagnosis of major depressive disorder, and extracted inpatient psychiatric discharge narrative notes. Using these data, we trained a 75-topic Latent Dirichlet Allocation (LDA) model, a form of natural language processing, which identifies groups of words associated with topics discussed in a document collection. The cohort was randomly split to derive a training (70%) and testing (30%) data set, and we trained separate support vector machine models for baseline clinical features alone, baseline features plus common individual words and the above plus topics identified from the 75-topic LDA model. Of 4687 patients with inpatient discharge summaries, 470 were readmitted within 30 days. The 75-topic LDA model included topics linked to psychiatric symptoms (suicide, severe depression, anxiety, trauma, eating/weight and panic) and major depressive disorder comorbidities (infection, postpartum, brain tumor, diarrhea and pulmonary disease). By including LDA topics, prediction of readmission, as measured by area under receiver-operating characteristic curves in the testing data set, was improved from baseline (area under the curve 0.618) to baseline + 1000 words (0.682) to baseline+75 topics (0.784). Inclusion of topics derived from narrative notes allows more accurate discrimination of individuals at high risk for psychiatric readmission in this cohort. Topic modeling and related approaches offer the potential to improve prediction using EHRs, if generalizability can be established in other clinical cohorts.
引用
收藏
页码:e921 / e921
页数:5
相关论文
共 24 条
[1]  
[Anonymous], 2011, PREC MED BUILD KNOWL
[2]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[3]   VALIDATION OF A COMBINED COMORBIDITY INDEX [J].
CHARLSON, M ;
SZATROWSKI, TP ;
PETERSON, J ;
GOLD, J .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 1994, 47 (11) :1245-1251
[4]   A NEW METHOD OF CLASSIFYING PROGNOSTIC CO-MORBIDITY IN LONGITUDINAL-STUDIES - DEVELOPMENT AND VALIDATION [J].
CHARLSON, ME ;
POMPEI, P ;
ALES, KL ;
MACKENZIE, CR .
JOURNAL OF CHRONIC DISEASES, 1987, 40 (05) :373-383
[5]  
Elkin Peter L, 2008, AMIA Annu Symp Proc, P172
[6]   The Yale cTAKES extensions for document classification: architecture and application [J].
Garla, Vijay ;
Lo Re, Vincent, III ;
Dorey-Stein, Zachariah ;
Kidwai, Farah ;
Scotch, Matthew ;
Womack, Julie ;
Justice, Amy ;
Brandt, Cynthia .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 (05) :614-620
[7]  
Ghassemi M, 2012, P ICML WORKSH MACH L
[8]   Unfolding Physiological State: Mortality Modelling in Intensive Care Units [J].
Ghassemi, Marzyeh ;
Naumann, Tristan ;
Doshi-Velez, Finale ;
Brimmer, Nicole ;
Joshi, Rohit ;
Rumshisky, Anna ;
Szolovits, Peter .
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, :75-84
[9]   Finding scientific topics [J].
Griffiths, TL ;
Steyvers, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 :5228-5235
[10]  
Haerian K, 2012, AMIA Annu Symp Proc, V2012, P1244