Temporally informed random forests for suicide risk prediction

被引:14
作者
Bayramli, Ilkin [1 ,2 ]
Castro, Victor [3 ,4 ]
Barak-Corren, Yuval [1 ]
Madsen, Emily M. [5 ,6 ]
Nock, Matthew K. [4 ,7 ,8 ]
Smoller, Jordan W. [5 ,6 ,9 ]
Reis, Ben Y. [1 ,9 ]
机构
[1] Boston Childrens Hosp, Computat Hlth Informat Program, Predict Med Grp, 300 Longwood Ave, Boston, MA 02115 USA
[2] Harvard Univ, Cambridge, MA 02138 USA
[3] Mass Gen Brigham Res Informat Sci & Comp, Boston, MA USA
[4] Massachusetts Gen Hosp, Dept Psychiat, Boston, MA 02114 USA
[5] Massachusetts Gen Hosp, Ctr Genom Med, Psychiat & Neurodev Genet Unit, Boston, MA 02114 USA
[6] Massachusetts Gen Hosp, Ctr Precis Psychiat, Dept Psychiat, Boston, MA 02114 USA
[7] Harvard Univ, Dept Psychol, 33 Kirkland St, Cambridge, MA 02138 USA
[8] Franciscan Childrens, Mental Hlth Res Program, Brighton, MA USA
[9] Harvard Med Sch, Boston, MA 02115 USA
关键词
random forest; suicide; modeling; temporal; clinical risk;
D O I
10.1093/jamia/ocab225
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Suicide is one of the leading causes of death worldwide, yet clinicians find it difficult to reliably identify individuals at high risk for suicide. Algorithmic approaches for suicide risk detection have been developed in recent years, mostly based on data from electronic health records (EHRs). Significant room for improvement remains in the way these models take advantage of temporal information to improve predictions. Materials and Methods: We propose a temporally enhanced variant of the random forest (RF) model-Omni-Temporal Balanced Random Forests (OT-BRFs)-that incorporates temporal information in every tree within the forest. We develop and validate this model using longitudinal EHRs and clinician notes from the Mass General Brigham Health System recorded between 1998 and 2018, and compare its performance to a baseline Naive Bayes Classifier and 2 standard versions of balanced RFs. Results: Temporal variables were found to be associated with suicide risk: Elevated suicide risk was observed in individuals with a higher total number of visits as well as those with a low rate of visits over time, while lower suicide risk was observed in individuals with a longer period of EHR coverage. RF models were more accurate than Naive Bayesian classifiers at predicting suicide risk in advance (area under the receiver operating curve = 0.824 vs. 0.754, respectively). The proposed OT-BRF model performed best among all RF approaches, yielding a sensitivity of 0.339 at 95% specificity, compared to 0.290 and 0.286 for the other 2 RF models. Temporal variables were assigned high importance by the models that incorporated them. Discussion: We demonstrate that temporal variables have an important role to play in suicide risk detection and that requiring their inclusion in all RF trees leads to increased predictive performance. Integrating temporal information into risk prediction models helps the models interpret patient data in temporal context, improving predictive performance.
引用
收藏
页码:62 / 71
页数:10
相关论文
共 26 条
[1]   Validation of an Electronic Health Record-Based Suicide Risk Prediction Modeling Approach Across Multiple Health Care Systems [J].
Barak-Corren, Yuval ;
Castro, Victor M. ;
Nock, Matthew K. ;
Mandl, Kenneth D. ;
Madsen, Emily M. ;
Seiger, Ashley ;
Adams, William G. ;
Applegate, R. Joseph ;
Bernstam, Elmer, V ;
Klann, Jeffrey G. ;
McCarthy, Ellen P. ;
Murphy, Shawn N. ;
Natter, Marc ;
Ostasiewski, Brian ;
Patibandla, Nandan ;
Rosenthal, Gary E. ;
Silva, George S. ;
Wei, Kun ;
Weber, Griffin M. ;
Weiler, Sarah R. ;
Reis, Ben Y. ;
Smoller, Jordan W. .
JAMA NETWORK OPEN, 2020, 3 (03)
[2]   Predicting Suicidal Behavior From Longitudinal Electronic Health Records [J].
Barak-Corren, Yuval ;
Castro, Victor M. ;
Javitt, Solomon ;
Hoffnagle, Alison G. ;
Dai, Yael ;
Perlis, Roy H. ;
Nock, Matthew K. ;
Smoller, Jordan W. ;
Reis, Ben Y. .
AMERICAN JOURNAL OF PSYCHIATRY, 2017, 174 (02) :154-162
[3]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270
[4]  
Breiman L., 2001, Mach. Learn., V45, P5
[5]  
Chao C, 2004, USING RANDOM FOREST, P1
[6]  
Chapman W.W., 2007, Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, P81
[7]  
Fazel S, 2020, NEW ENGL J MED, V382, P266, DOI [10.1056/NEJMra1902944, 10.1056/NEJMc2002190]
[8]   Adaptive random forests for evolving data stream classification [J].
Gomes, Heitor M. ;
Bifet, Albert ;
Read, Jesse ;
Barddal, Jean Paul ;
Enembreck, Fabricio ;
Pfharinger, Bernhard ;
Holmes, Geoff ;
Abdessalem, Talel .
MACHINE LEARNING, 2017, 106 (9-10) :1469-1495
[9]  
Hedegaard Holly, 2018, NCHS Data Brief, P1
[10]   Predicting disease risks from highly imbalanced data using random forest [J].
Khalilia, Mohammed ;
Chakraborty, Sounak ;
Popescu, Mihail .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2011, 11