Strategies for building robust prediction models using data unavailable at prediction time

被引:6
作者
Yang, Haoyu [1 ]
Tourani, Roshan [2 ]
Zhu, Ying [2 ]
Kumar, Vipin [1 ]
Melton, Genevieve B. [2 ,3 ]
Steinbach, Michael [1 ]
Simon, Gyorgy [2 ,4 ]
机构
[1] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN USA
[2] Univ Minnesota, Inst Hlth Informat, Minneapolis, MN USA
[3] Univ Minnesota, Dept Surg, Box 242 UMHC, Minneapolis, MN 55455 USA
[4] Univ Minnesota, Dept Internal Med, Minneapolis, MN USA
关键词
artificial intelligence; hospital-acquired infection; machine learning; predictive modeling; SEPSIS;
D O I
10.1093/jamia/ocab229
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Hospital-acquired infections (HAIs) are associated with significant morbidity, mortality, and prolonged hospital length of stay. Risk prediction models based on pre- and intraoperative data have been proposed to assess the risk of HAIs at the end of the surgery, but the performance of these models lag behind HAI detection models based on postoperative data. Postoperative data are more predictive than pre- or interoperative data since it is closer to the outcomes in time, but it is unavailable when the risk models are applied (end of surgery). The objective is to study whether such data, which is temporally unavailable at prediction time (TUP) (and thus cannot directly enter the model), can be used to improve the performance of the risk model. Materials and Methods: An extensive array of 12 methods based on logistic/linear regression and deep learning were used to incorporate the TUP data using a variety of intermediate representations of the data. Due to the hierarchical structure of different HAI outcomes, a comparison of single and multi-task learning frameworks is also presented. Results and Discussion: The use of TUP data was always advantageous as baseline methods, which cannot utilize TUP data, never achieved the top performance. The relative performances of the different models vary across the different outcomes. Regarding the intermediate representation, we found that its complexity was key and that incorporating label information was helpful. Conclusions: Using TUP data significantly helped predictive performance irrespective of the model complexity.
引用
收藏
页码:72 / 79
页数:8
相关论文
共 29 条
  • [1] Prediction of Treatment Medicines With Dual Adaptive Sequential Networks
    An, Yang
    Zhang, Liang
    Yang, Haoyu
    Sun, Leilei
    Jin, Bo
    Liu, Chuanren
    Yu, Ruiyun
    Wei, Xiaopeng
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (11) : 5496 - 5509
  • [2] Machine learning for early detection of sepsis: an internal and temporal validation study
    Bedoya, Armando D.
    Futoma, Joseph
    Clement, Meredith E.
    Corey, Kristin
    Brajer, Nathan
    Lin, Anthony
    Simons, Morgan G.
    Gao, Michael
    Nichols, Marshall
    Balu, Suresh
    Heller, Katherine
    Sendak, Mark
    O'Brien, Cara
    [J]. JAMIA OPEN, 2020, 3 (02) : 252 - 260
  • [3] Bialek William, 2000, The information bottleneck method
  • [4] Chechik G, 2005, J MACH LEARN RES, V6, P165
  • [5] Colombo D, 2014, J MACH LEARN RES, V15, P3741
  • [6] Efron B., 1994, INTRO BOOTSTRAP
  • [7] Named entity recognition in electronic health records using transfer learning bootstrapped Neural Networks
    Gligic, Luka
    Kormilitzin, Andrey
    Goldberg, Paul
    Nevado-Holgado, Alejo
    [J]. NEURAL NETWORKS, 2020, 121 : 132 - 139
  • [8] Knowledge Distillation: A Survey
    Gou, Jianping
    Yu, Baosheng
    Maybank, Stephen J.
    Tao, Dacheng
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) : 1789 - 1819
  • [9] Guo CA, 2017, PR MACH LEARN RES, V70
  • [10] Transfer Learning for Clinical Time Series Analysis Using Deep Neural Networks
    Gupta, Priyanka
    Malhotra, Pankaj
    Narwariya, Jyoti
    Vig, Lovekesh
    Shroff, Gautam
    [J]. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2020, 4 (02) : 112 - 137