A model-based hybrid soft actor-critic deep reinforcement learning algorithm for optimal ventilator settings

被引:25
作者
Chen, Shaotao [1 ]
Qiu, Xihe [1 ]
Tan, Xiaoyu [2 ]
Fang, Zhijun [1 ]
Jin, Yaochu [3 ]
机构
[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai, Peoples R China
[2] Ant Grp, Hangzhou, Peoples R China
[3] Bielefeld Univ, Fac Technol, D-33619 Bielefeld, Germany
基金
中国国家自然科学基金;
关键词
Optimal ventilator settings; Reinforcement learning; Hybrid action space; Optimal strategy; Machine learning; SYSTEM;
D O I
10.1016/j.ins.2022.08.028
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A ventilator is a device that mechanically assists in pumping air into the lungs, which is a life-saving supportive therapy in an intensive care unit (ICU). In clinical scenarios, each patient has unique physiological circumstances and specific respiratory diseases, thus requiring individualized ventilator settings. Long-term supervision by experienced clini-cians is essential to perform the task of precisely adjusting ventilator parameters and mak-ing timely modifications. Moreover, a tiny clinical error can result in severe lung injury, induce multi-system organ dysfunction, and increase mortality. To reduce the workload of clinicians and prevent medical errors, machine learning (ML), or more specifically, rein-forcement learning (RL) methods, have been developed to automatically adjust the venti-lator's parameters and select optimal strategies. However, the ventilator settings contain both continuous (e.g., frequency) and discrete parameters (e.g., ventilation mode), making it challenging for conventional RL-based approaches to handle such problems. Meanwhile, it is necessary to develop models with high data efficiency to overcome medical data insuf-ficiency. In this paper, we propose a model-based hybrid soft actor-critic (MHSAC) algo-rithm that is developed based on the classic soft actor-critic (SAC) and model-based policy optimization (MBPO) framework. This algorithm can learn both continuous and dis-crete policies according to the current and predictive state of patient's physiological infor-mation with high data efficiency. Results reveal that our proposed model significantly outperforms the baseline models, achieving superior efficiency and high accuracy in the OpenAI Gym simulation environment. Our proposed model is capable of resolving mixed action space problems, enhancing data efficiency, and accelerating convergence, which can generate practical optimal ventilator settings, minimize possible medical errors, and provide clinical decision support.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:47 / 64
页数:18
相关论文
共 50 条
[1]   Development of closed-loop modelling framework for adaptive respiratory pacemakers [J].
Ai, Weiwei ;
Suresh, Vinod ;
Roop, Partha S. .
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 141
[2]   Towards safe reinforcement-learning in industrial grid-warehousing [J].
Andersen, Per-Arne ;
Goodwin, Morten ;
Granmo, Ole-Christoffer .
INFORMATION SCIENCES, 2020, 537 :467-484
[3]   A novel machine learning model to predict respiratory failure and invasive mechanical ventilation in critically ill patients suffering from COVID-19 [J].
Bendavid, Itai ;
Statlender, Liran ;
Shvartser, Leonid ;
Teppler, Shmuel ;
Azullay, Roy ;
Sapir, Rotem ;
Singer, Pierre .
SCIENTIFIC REPORTS, 2022, 12 (01)
[4]   Ventilation management and clinical outcomes in invasively ventilated patients with COVID-19 (PRoVENT-COVID): a national, multicentre, observational cohort study [J].
Botta, Michela ;
Tsonas, Anissa M. ;
Pillay, Janesh ;
Boers, Leonoor S. ;
Algera, Anna Geke ;
Bos, Lieuwe D. J. ;
Dongelmans, Dave A. ;
Hollmann, Marcus W. ;
Horn, Janneke ;
Vlaar, Alexander P. J. ;
Schultz, Marcus J. ;
Neto, Ary Serpa ;
Paulus, Frederique .
LANCET RESPIRATORY MEDICINE, 2021, 9 (02) :139-148
[5]   Gradient temporal-difference learning for off-policy evaluation using emphatic weightings [J].
Cao, Jiaqing ;
Liu, Quan ;
Zhu, Fei ;
Fu, Qiming ;
Zhong, Shan .
INFORMATION SCIENCES, 2021, 580 :311-330
[6]   Day-ahead scheduling based on reinforcement learning with hybrid action space [J].
Cao Jingyu ;
Dong Lu ;
Sun Changyin .
JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2022, 33 (03) :693-705
[7]   Deep reinforcement learning based moving object grasping [J].
Chen, Pengzhan ;
Lu, Weiqing .
INFORMATION SCIENCES, 2021, 565 :62-76
[8]  
Christodoulou P, 2019, Arxiv, DOI [arXiv:1910.07207, DOI 10.48550/ARXIV.1910.07207]
[9]  
Delalleau O., 2018, arXiv
[10]   Closed-Loop Versus Conventional Mechanical Ventilation in COVID-19 ARDS [J].
Garcia, Pedro David Wendel ;
Hofmaenner, Daniel Andrea ;
Brugger, Silvio D. ;
Acevedo, Claudio T. ;
Bartussek, Jan ;
Camen, Giovanni ;
Bader, Patrick Raphael ;
Bruellmann, Gregor ;
Kattner, Johannes ;
Ganter, Christoph ;
Schuepbach, Reto Andreas ;
Buehler, Philipp Karl .
JOURNAL OF INTENSIVE CARE MEDICINE, 2021, 36 (10) :1184-1193