Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning

被引：0

作者：

Liu, Yingjun ^{[1
]}

Liu, Fuchun ^{[1
]}

Huang, Renwei ^{[2
]}

机构：

[1] Guangdong Univ Technol, Sch Comp Sci & Technol, Guangzhou 510006, Peoples R China

[2] Guangzhou City Polytech, Coll Elect & Informat Engn, Guangzhou 511370, Peoples R China

来源：

SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期

关键词：

Optimal control; Imitation learning; Deep reinforcement learning; MODEL;

D O I：

10.1038/s41598-025-04417-2

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Supervisory control theory (SCT) is widely used as safeguard mechanism with control of discrete event systems (DESs). In complex continuous systems, in order to avoid system's behavior violating specifications, the supervised control problem of these systems is quite different. Continuous state and action spaces of high dimension make languages of automaton no longer suitable for describing the information of specifications which remains challenging on control of real physical systems. Reinforcement learning (RL) automatically learns complex decisions through trial and error, but it requires the design of precise reward functions combined with domain knowledge. For complex scenarios where the reward function cannot be achieved or is only with sparse rewards, we proposed a novel supervised optimal control framework based on trajectory imitation (TI) and reinforcement learning (RL) in this paper. Firstly, behavior cloning (BC) is adopted to pre-train the policy model based on a small number of human demonstrations. Secondly, a generative adversarial imitation learning (GAIL) method is carried out to obtain the implicit characteristics of demonstration data. Furthermore, after the primary and implicit features are extracted by the above steps, a Demo-based RL algorithm is designed by adding the demonstration data to the RL replay buffer with augmented loss function to enhance the system performance to its maximum potential. Finally, the proposed method is validated through multiple simulation experiments on object relocation and tool using task of dexterous multifingered hands. In handling the more complex tool using task, the proposed approach achieves a 19.7% decrease in convergence time as opposed to the latest method. And the proposed method for the two tasks results in policies that display natural movements and shows higher robustness compared with the baseline model.

引用

页数：13

共 25 条

[1] Reinforcement Learning Enabled Autonomous Manufacturing Using Transfer Learning and Probabilistic Reward Modeling [J].

Alam, Md Ferdous ;

Shtein, Max ;

Barton, Kira ;

Hoelzle, David .

IEEE CONTROL SYSTEMS LETTERS, 2023, 7 :508-513

[2]

Baker B.J., 2024, Encyclopedia of Sport Management, P1021

[3]

Balasubramanian S., 2023, J. Sci. Technol, V4, P46, DOI [10.55662/JST.2023.4502, DOI 10.55662/JST.2023.4502]

[4] Modeling Human Driving Behavior Through Generative Adversarial Imitation Learning [J].

Bhattacharyya, Raunak ;

Wulfe, Blake ;

Phillips, Derek J. ;

Kuefler, Alex ;

Morton, Jeremy ;

Senanayake, Ransalu ;

Kochenderfer, Mykel J. .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (03) :2874-2887

[5]

Brys T, 2015, PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), P3352

[6] Aircraft Control System Using LQG and LQR Controller with Optimal Estimation-Kalman Filter Design [J].

Chrif, Labane ;

Kadda, Zemalache Meguenni .

3RD INTERNATIONAL SYMPOSIUM ON AIRCRAFT AIRWORTHINESS (ISAA 2013), 2014, 80 :245-257

[7] A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients [J].

Grondman, Ivo ;

Busoniu, Lucian ;

Lopes, Gabriel A. D. ;

Babuska, Robert .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06) :1291-1307

[8] Model-based trajectory stitching for improved behavioural cloning and its applications [J].

Hepburn, Charles A. ;

Montana, Giovanni .

MACHINE LEARNING, 2023, 113 (2) :647-674

[9] Efficient Safe Control via Deep Reinforcement Learning and Supervisory Control - Case Study on Multi-Robot [J].

Konishi, Masahiro ;

Sasaki, Tomotake ;

Cai, Kai .

IFAC PAPERSONLINE, 2022, 55 (28) :16-21

[10] Model Reference Tracking Control Solutions for a Visual Servo System Based on a Virtual State from Unknown Dynamics [J].

Lala, Timotei ;

Chirla, Darius-Pavel ;

Radac, Mircea-Bogdan .

ENERGIES, 2022, 15 (01)

← 1 2 3 →