Multimodal fine-tuning of clinical language models for predicting COVID-19 outcomes

被引:3
作者
Henriksson, Aron [1 ]
Pawar, Yash [1 ]
Hedberg, Pontus [2 ,3 ]
Naucler, Pontus [2 ,3 ]
机构
[1] Stockholm Univ, Dept Comp & Syst Sci DSV, Kista, Sweden
[2] Karolinska Inst, Dept Med Solna MedS, Div Infect Dis, Stockholm, Sweden
[3] Karolinska Univ Hosp, Dept Infect Dis, Stockholm, Sweden
基金
瑞典研究理事会;
关键词
Natural language processing; Machine learning; Language models; Clinical BERT; Multimodal learning; Electronic health records; Outcome prediction; COVID-19;
D O I
10.1016/j.artmed.2023.102695
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clinical prediction models tend only to incorporate structured healthcare data, ignoring information recorded in other data modalities, including free-text clinical notes. Here, we demonstrate how multimodal models that effectively leverage both structured and unstructured data can be developed for predicting COVID-19 outcomes. The models are trained end-to-end using a technique we refer to as multimodal fine-tuning, whereby a pre -trained language model is updated based on both structured and unstructured data. The multimodal models are trained and evaluated using a multicenter cohort of COVID-19 patients encompassing all encounters at the emergency department of six hospitals. Experimental results show that multimodal models, leveraging the notion of multimodal fine-tuning and trained to predict (i) 30-day mortality, (ii) safe discharge and (iii) readmission, outperform unimodal models trained using only structured or unstructured healthcare data on all three outcomes. Sensitivity analyses are performed to better understand how well the multimodal models perform on different patient groups, while an ablation study is conducted to investigate the impact of different types of clinical notes on model performance. We argue that multimodal models that make effective use of routinely collected healthcare data to predict COVID-19 outcomes may facilitate patient management and contribute to the effective use of limited healthcare resources.
引用
收藏
页数:11
相关论文
共 28 条
  • [1] [Anonymous], 2022, P 15 INT JOINT C BIO, V5, P180
  • [2] Prognostic factors for adverse outcomes in patients with COVID-19: a field-wide systematic review and meta-analysis
    Bellou, Vanesa
    Tzoulaki, Ioanna
    van Smeden, Maarten
    Moons, Karel G. M.
    Evangelou, Evangelos
    Belbasis, Lazaros
    [J]. EUROPEAN RESPIRATORY JOURNAL, 2022, 59 (02)
  • [3] Limitations of Transformers on Clinical Text Classification
    Gao, Shang
    Alawad, Mohammed
    Young, M. Todd
    Gounley, John
    Schaefferkoetter, Noah
    Yoon, Hong Jun
    Wu, Xiao-Cheng
    Durbin, Eric B.
    Doherty, Jennifer
    Stroup, Antoinette
    Coyle, Linda
    Tourassi, Georgia
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (09) : 3596 - 3607
  • [4] Recurrent convolutional neural network based multimodal disease risk prediction
    Hao, Yixue
    Usama, Mohd
    Yang, Jun
    Hossain, M. Shamim
    Ghoneim, Ahmed
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 92 : 76 - 83
  • [5] Clinical phenotypes and outcomes of SARS-CoV-2, influenza, RSV and seven other respiratory viruses: a retrospective study using complete hospital data
    Hedberg, Pontus
    Karlsson Valik, John
    van der Werff, Suzanne
    Tanushi, Hideyuki
    Requena Mendez, Ana
    Granath, Fredrik
    Bell, Max
    Martensson, Johan
    Dyrdak, Robert
    Hertting, Olof
    Farnert, Anna
    Ternhag, Anders
    Naucler, Pontus
    [J]. THORAX, 2022, 77 (02) : 154 - 163
  • [6] Henriksson A, 2015, 2015 IEEE INT C DAT, P1
  • [7] Huang KX, 2019, Arxiv, DOI arXiv:1912.11975
  • [8] Huang KX, 2020, Arxiv, DOI [arXiv:1904.05342, 10.48550/arXiv.1904.05342]
  • [9] Husmann Severin, 2022, NEURIPS 2022 WORKSH
  • [10] Jin MQ, 2018, Arxiv, DOI arXiv:1811.12276