An adversarial twin-agent inverse proximal policy optimization guided by model predictive control

被引:0
作者
Gupta, Nikita [1 ,4 ]
Kandath, Harikumar [2 ]
Kodamana, Hariprasad [1 ,3 ,4 ]
机构
[1] Indian Inst Technol Delhi, Dept Chem Engn, Hauz Khas, New Delhi 110016, India
[2] Int Inst Informat Technol Hyderabad, Hyderabad, India
[3] Indian Inst Technol Delhi, Yardi Sch Artificial Intelligence, Hauz Khas, New Delhi, India
[4] Indian Inst Technol Delhi Abu Dhabi, Abu Dhabi, U Arab Emirates
关键词
Reinforcement learning; Proximal Policy Optimization; Inverse Reinforcement Learning (IRL); Adversarial IRL (AIRL); Discriminator; CHO-CELLS; TEMPERATURE; PRODUCTIVITY; CHALLENGES; METABOLISM; PROGRESS; SYSTEMS; IMPACT; MPC;
D O I
10.1016/j.compchemeng.2025.109124
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Reward design is a key challenge in reinforcement learning (RL) as it directly affects the effectiveness of learned policies. Inverse Reinforcement Learning (IRL) attempts to solve this problem by learning reward functions from expert trajectories. This study utilizes a reward design using Adversarial IRL (AIRL) frameworks using expert trajectories from Model Predictive Control (MPC). On the contrary, there are also instances where a pre-defined reward function works well, indicating a potential trade-off between these two. To achieve this, we propose a twin-agent reinforcement learning framework where the first agent utilizes a pre-defined reward function, while the second agent learns reward in the AIRL setting guided by MPC with Proximal Policy Optimization (PPO) as the backbone (PPO-MPC-AIRL). The performance of the proposed algorithm has been tested using a case study, namely, mAb production in the bioreactor. The simulation results indicate that the proposed algorithm is able to reduce the root mean square error (RMSE) of set-point tracking by 18.38 % compared to the nominal PPO.
引用
收藏
页数:9
相关论文
共 50 条
[21]   Distributed optimization: applications in model predictive control [J].
Braun, Philipp ;
Gruene, Lars .
AT-AUTOMATISIERUNGSTECHNIK, 2018, 66 (11) :939-949
[22]   Traffic Signal Control Method Based on Modified Proximal Policy Optimization [J].
An, Yaohui ;
Zhang, Jing .
2022 10TH INTERNATIONAL CONFERENCE ON TRAFFIC AND LOGISTIC ENGINEERING (ICTLE 2022), 2022, :83-88
[23]   Distributed Model Predictive Control via Separable Optimization in Multiagent Networks [J].
Shorinwa, Ola ;
Schwager, Mac .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (01) :230-245
[24]   Stabilizing Linear Model Predictive Control Under Inexact Numerical Optimization [J].
Rubagotti, Matteo ;
Patrinos, Panagiotis ;
Bemporad, Alberto .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (06) :1660-1666
[25]   Optimization of the Model Predictive Control Update Interval Using Reinforcement Learning [J].
Bohn, Eivind ;
Gros, Sebastien ;
Moe, Signe ;
Johansen, Tor Arne .
IFAC PAPERSONLINE, 2021, 54 (14) :257-262
[26]   Robust integration of real time optimization with linear model predictive control [J].
Alvarez, Luz A. ;
Odloak, Darci .
COMPUTERS & CHEMICAL ENGINEERING, 2010, 34 (12) :1937-1944
[27]   Optimal Policy-Making for Municipal Waste Management Based on Predictive Model Optimization [J].
Ahmad, Shabir ;
Imran ;
Iqbal, Naeem ;
Jamil, Faisal ;
Kim, Dohyeun .
IEEE ACCESS, 2020, 8 (08) :218458-218469
[28]   Distributed output-feedback model predictive control for multi-agent consensus [J].
Copp, David A. ;
Vamvoudakis, Kyriakos G. ;
Hespanha, Joao P. .
SYSTEMS & CONTROL LETTERS, 2019, 127 :52-59
[29]   Multi-agent Proximal Policy Optimization via Non-fixed Value Clipping [J].
Liu, Chiqiang ;
Li, Dazi .
2023 IEEE 12TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE, DDCLS, 2023, :1684-1688
[30]   Economic Model Predictive Control for Microgrid Optimization: A Review [J].
Hu, Jiefeng ;
Shan, Yinghao ;
Yang, Yong ;
Parisio, Alessandra ;
Li, Yong ;
Amjady, Nima ;
Islam, Syed ;
Cheng, Ka Wai ;
Guerrero, Josep M. ;
Rodriguez, Jose .
IEEE TRANSACTIONS ON SMART GRID, 2024, 15 (01) :472-484