Joint Optimization of Concave Scalarized Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

被引：0

作者：

Bai, Qinbo ^{[1
]}

Agarwal, Mridul ^{[1
]}

Aggarwal, Vaneet ^{[1
]}

机构：

[1] Purdue Univ, W Lafayette, IN 47907 USA

来源：

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2022年 / 74卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, an asymptotically biased estimator is proposed. The proposed algorithm is shown to achieve convergence to within an epsilon of the global optima after sampling O(M-4 sigma(2)/(1-gamma)(8)epsilon(4)) trajectories where gamma is the discount factor and M is the number of the agents, thus achieving the same dependence on epsilon as the policy gradient algorithm for the standard reinforcement learning.

引用

页码：1565 / 1597

页数：33

共 50 条

[1] Joint Optimization of Concave Scalarized Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
Bai, Qinbo
Agarwal, Mridul
Aggarwal, Vaneet
Journal of Artificial Intelligence Research, 2022, 74 : 1565 - 1597
[2] An Improved Multi-objective Optimization Algorithm Based on Reinforcement Learning
Liu, Jun
Zhou, Yi
Qiu, Yimin
Li, Zhongfeng
ADVANCES IN SWARM INTELLIGENCE, ICSI 2022, PT I, 2022, : 501 - 513
[3] Scalarized Multi-Objective Reinforcement Learning: Novel Design Techniques
Van Moffaert, Kristof
Drugan, Madalina M.
Nowe, Ann
PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2013, : 191 - 199
[4] A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation
Yang, Runzhe
Sun, Xingyuan
Narasimhan, Karthik
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[5] Reinforcement Learning-Based Hybrid Multi-Objective Optimization Algorithm Design
Palm, Herbert
Arndt, Lorin
INFORMATION, 2023, 14 (05)
[6] A multi-objective optimization algorithm based on gradient information
Qi, Rongbin
Liu, Chenxia
Zhong, Weimin
Qian, Feng
Huagong Xuebao/CIESC Journal, 2013, 64 (12): : 4401 - 4409
[7] Decomposition based Multi-Objective Evolutionary Algorithm in XCS for Multi-Objective Reinforcement Learning
Cheng, Xiu
Browne, Will N.
Zhang, Mengjie
2018 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2018, : 622 - 629
[8] Multimodal Scalarized Preferences in Multi-objective Optimization
Braun, Marlon
Heling, Lars
Shukla, Pradyumn
Schmeck, Hartmut
PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'17), 2017, : 545 - 552
[9] Safety Optimized Reinforcement Learning via Multi-Objective Policy Optimization
Honari, Homayoun
Tamizi, Mehran Ghafarian
Najjaran, Homayoun
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 2873 - 2879
[10] Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning
Kanazawa, Takuya
Gupta, Chetan
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 63 - 76

← 1 2 3 4 5 →