Data-Driven Dynamic Multiobjective Optimal Control: An Aspiration-Satisfying Reinforcement Learning Approach

被引：20

作者：

Mazouchi, Majid ^{[1
]}

Yang, Yongliang ^{[2
]}

Modares, Hamidreza ^{[1
]}

机构：

[1] Michigan State Univ, Mech Engn Dept, E Lansing, MI 48824 USA

[2] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Minist Educ, Key Lab Knowledge Automat Ind Proc, Beijing 10083, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2022年 / 33卷 / 11期

关键词：

Multiobjective (MO) optimization; reinforcement learning (RL); satisficing control; sum-of-square program; SYSTEMS;

D O I：

10.1109/TNNLS.2021.3072571

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article presents an iterative data-driven algorithm for solving dynamic multiobjective (MO) optimal control problems arising in control of nonlinear continuous-time systems. It is first shown that the Hamiltonian functional corresponding to each objective can be leveraged to compare the performance of admissible policies. Hamiltonian inequalities are then used for which their satisfaction guarantees satisfying the objectives' aspirations. Relaxed Hamilton-Jacobi-Bellman (HJB) equations in terms of HJB inequalities are then solved in a dynamic constrained MO framework to find Pareto optimal solutions. Relation to satisficing (good enough) decision-making framework is shown. A sum-of-square (SOS)-based iterative algorithm is developed to solve the formulated aspiration-satisfying MO optimization. To obviate the requirement of complete knowledge of the system dynamics, a data-driven satisficing reinforcement learning approach is proposed to solve the SOS optimization problem in real time using only the information of the system trajectories measured during a time interval without having full knowledge of the system dynamics. Finally, two simulation examples are utilized to verify the analytical results of the proposed algorithm.

引用

页码：6183 / 6193

页数：11

共 33 条

[1] Online Multi-Objective Model-Independent Adaptive Tracking Mechanism for Dynamical Systems [J].

Abouheaf, Mohammed ;

Gueaieb, Wail ;

Spinello, Davide .

ROBOTICS, 2019, 8 (04)

[2]

Ahmadi Amir Ali, 2018, Tech. Rep, P1

[3]

Barrett L., 2008, P 25 INT C MACH LEAR, P41, DOI 10.1145/1390156.1390162

[4] Performance Guarantees for Model-Based Approximate Dynamic Programming in Continuous Spaces [J].

Beuchat, Paul Nathaniel ;

Georghiou, Angelos ;

Lygeros, John .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (01) :143-158

[5]

Bien, 2004, HDB LEARNING APPROXI, P433

[6]

Caramia M., 2008, MULTIOBJECTIVE MANAG, P11, DOI [10.1007/978-1-84800-382-8, DOI 10.1007/978-1-84800-382-8]

[7] COMPUTATION OF PARETO OPTIMA IN STRUCTURAL DESIGN [J].

CARMICHAEL, DG .

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 1980, 15 (06) :925-929

[8]

Chen X, 2019, IEEE INT C INT ROBOT, P977, DOI [10.1109/iros40897.2019.8968092, 10.1109/IROS40897.2019.8968092]

[9] A closer look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multicriteria optimization problems [J].

Das, I ;

Dennis, JE .

STRUCTURAL OPTIMIZATION, 1997, 14 (01) :63-69

[10]

Gambier A., 2011, 8th Asian Control Conference (ASCC 2011), P1084

← 1 2 3 4 →