Balancing therapeutic effect and safety in ventilator parameter recommendation: An offline reinforcement learning approach

被引:0
作者
Zhang, Bo [1 ]
Qiu, Xihe [1 ]
Tan, Xiaoyu [2 ]
机构
[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai 201620, Peoples R China
[2] INF Technol Shanghai Co Ltd, Shanghai 610101, Peoples R China
基金
中国国家自然科学基金; 上海市自然科学基金;
关键词
Offline reinforcement learning; Fitted Q evaluation; Ventilator parameters recommendation; Agent exploration;
D O I
10.1016/j.engappai.2023.107784
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning (RL) is increasingly applied in recommending ventilator parameters, yet existing methods prioritize therapeutic effect over patient safety. This leads to excessive exploration by the RL agent, posing risks. To address this, we propose a novel offline RL approach that leverages existing clinical data for exploration, employing fitted Q evaluation (FQE) for policy evaluation to minimize patient risk compared to online evaluation. Our method introduces a variational auto-encoder-gumble softmax (VAE-GS) model, discerning the hidden relationship between patient physiological status and ventilator parameters to constrain the exploratory space of the agent. Additionally, a noise network aids the agent in fully exploring the reachable space to find optimal ventilator parameters. Our approach significantly enhances safety, as evidenced by experiments on the Medical Information Mart for Intensive Care III (MIMIC -III) dataset. It outperforms existing algorithms including the deterministic policy gradient algorithm (DDPG), soft actor -critic (SAC), batch -constrained deep Q -learning (BCQ), conservative Q -learning (CQL), and closed -form policy improvement operators (CFPI), showing improvements of 76.9%, 82.8%, 23.5%, 49.1% and 23.5%, respectively, while maintaining therapeutic effect.
引用
收藏
页数:13
相关论文
共 57 条
[1]   Development of closed-loop modelling framework for adaptive respiratory pacemakers [J].
Ai, Weiwei ;
Suresh, Vinod ;
Roop, Partha S. .
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 141
[2]  
Alhazzani W, 2020, INTENS CARE MED, V46, P854, DOI [10.1007/s00134-020-06022-5, 10.1097/CCM.0000000000004363]
[3]   Multi-objective scheduling of IoT-enabled smart homes for energy management based on Arithmetic Optimization Algorithm: A Node-RED and NodeMCU module-based technique [J].
Bahmanyar, Danial ;
Razmjooy, Navid ;
Mirjalili, Seyedali .
KNOWLEDGE-BASED SYSTEMS, 2022, 247
[4]  
Chatburn RL, 2007, RESPIR CARE, V52, P301
[5]   A model-based hybrid soft actor-critic deep reinforcement learning algorithm for optimal ventilator settings [J].
Chen, Shaotao ;
Qiu, Xihe ;
Tan, Xiaoyu ;
Fang, Zhijun ;
Jin, Yaochu .
INFORMATION SCIENCES, 2022, 611 :47-64
[6]  
Farajtabar M, 2018, PR MACH LEARN RES, V80
[7]  
Fujimoto S, 2019, PR MACH LEARN RES, V97
[8]  
Haarnoja T, 2018, PR MACH LEARN RES, V80
[9]  
Hafner D., INT C LEARNING REPRE, DOI [10.48550/arXiv.1912.01603, DOI 10.48550/ARXIV.1912.01603]
[10]  
Hao Botao, 2021, P MACHINE LEARNING R, V139