An adversarial twin-agent inverse proximal policy optimization guided by model predictive control

被引：0

作者：

Gupta, Nikita ^{[1
,4
]}

Kandath, Harikumar ^{[2
]}

Kodamana, Hariprasad ^{[1
,3
,4
]}

机构：

[1] Indian Inst Technol Delhi, Dept Chem Engn, Hauz Khas, New Delhi 110016, India

[2] Int Inst Informat Technol Hyderabad, Hyderabad, India

[3] Indian Inst Technol Delhi, Yardi Sch Artificial Intelligence, Hauz Khas, New Delhi, India

[4] Indian Inst Technol Delhi Abu Dhabi, Abu Dhabi, U Arab Emirates

来源：

COMPUTERS & CHEMICAL ENGINEERING | 2025年 / 199卷

关键词：

Reinforcement learning; Proximal Policy Optimization; Inverse Reinforcement Learning (IRL); Adversarial IRL (AIRL); Discriminator; CHO-CELLS; TEMPERATURE; PRODUCTIVITY; CHALLENGES; METABOLISM; PROGRESS; SYSTEMS; IMPACT; MPC;

D O I：

10.1016/j.compchemeng.2025.109124

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Reward design is a key challenge in reinforcement learning (RL) as it directly affects the effectiveness of learned policies. Inverse Reinforcement Learning (IRL) attempts to solve this problem by learning reward functions from expert trajectories. This study utilizes a reward design using Adversarial IRL (AIRL) frameworks using expert trajectories from Model Predictive Control (MPC). On the contrary, there are also instances where a pre-defined reward function works well, indicating a potential trade-off between these two. To achieve this, we propose a twin-agent reinforcement learning framework where the first agent utilizes a pre-defined reward function, while the second agent learns reward in the AIRL setting guided by MPC with Proximal Policy Optimization (PPO) as the backbone (PPO-MPC-AIRL). The performance of the proposed algorithm has been tested using a case study, namely, mAb production in the bioreactor. The simulation results indicate that the proposed algorithm is able to reduce the root mean square error (RMSE) of set-point tracking by 18.38 % compared to the nominal PPO.

引用

页数：9

共 50 条

[1] Process control of mAb production using multi-actor proximal policy optimization
Gupta, Nikita
Anand, Shikhar
Joshi, Tanuja
Kumar, Deepak
Ramteke, Manojkumar
Kodamana, Hariprasad
DIGITAL CHEMICAL ENGINEERING, 2023, 8
[2] Proximal policy optimization with an integral compensator for quadrotor control
Huan Hu
Qing-ling Wang
Frontiers of Information Technology & Electronic Engineering, 2020, 21 : 777 - 795
[3] Proximal policy optimization with an integral compensator for quadrotor control
Hu, Huan
Wang, Qing-ling
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (05) : 777 - 795
[4] Inverse Model Optimization by Differential Evolution to improve Neural Predictive Control
Morales-Perez, Edgar Ademir
Iba, Hitoshi
2020 JOINT 11TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS AND 21ST INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (SCIS-ISIS), 2020, : 325 - 332
[5] Enhancement of Control Performance for Degraded Robot Manipulators Using Digital Twin and Proximal Policy Optimization
Park, Su-Young
Lee, Cheonghwa
Kim, Hyungjung
Ahn, Sung-Hoon
IEEE ACCESS, 2024, 12 : 19569 - 19583
[6] Hybrid CNN-LSTM and Proximal Policy Optimization Model for Traffic Light Control in a Multi-Agent Environment
Faqir, Nada
Ennaji, Yassine
Chakir, Loqman
Boumhidi, Jaouad
IEEE ACCESS, 2025, 13 : 29577 - 29588
[7] Melanoma classification using generative adversarial network and proximal policy optimization
Ju, Xiangui
Lin, Chi-Ho
Lee, Suan
Wei, Sizheng
PHOTOCHEMISTRY AND PHOTOBIOLOGY, 2024,
[8] Intelligent Control of a Quadrotor with Proximal Policy Optimization Reinforcement Learning
Lopes, Guilherme Cano
Ferreira, Murillo
Simoes, Alexandre da Silva
Colombini, Esther Luna
15TH LATIN AMERICAN ROBOTICS SYMPOSIUM 6TH BRAZILIAN ROBOTICS SYMPOSIUM 9TH WORKSHOP ON ROBOTICS IN EDUCATION (LARS/SBR/WRE 2018), 2018, : 503 - 508
[9] Optimal Control Algorithm for Subway Train Operation by Proximal Policy Optimization
Chen, Bin
Gao, Chunhai
Zhang, Lei
Chen, Junjie
Chen, Jun
Li, Yuyi
APPLIED SCIENCES-BASEL, 2023, 13 (13):
[10] Multivariable PID Control Using Improved State Space Model Predictive Control Optimization
Wu, Sheng
INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2015, 54 (20) : 5505 - 5513

← 1 2 3 4 5 →