Joint Optimization of Concave Scalarized Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

被引:0
|
作者
Bai, Qinbo [1 ]
Agarwal, Mridul [1 ]
Aggarwal, Vaneet [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
来源
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2022年 / 74卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, an asymptotically biased estimator is proposed. The proposed algorithm is shown to achieve convergence to within an epsilon of the global optima after sampling O(M-4 sigma(2)/(1-gamma)(8)epsilon(4)) trajectories where gamma is the discount factor and M is the number of the agents, thus achieving the same dependence on epsilon as the policy gradient algorithm for the standard reinforcement learning.
引用
收藏
页码:1565 / 1597
页数:33
相关论文
共 50 条
  • [1] Joint Optimization of Concave Scalarized Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
    Bai, Qinbo
    Agarwal, Mridul
    Aggarwal, Vaneet
    Journal of Artificial Intelligence Research, 2022, 74 : 1565 - 1597
  • [2] An Improved Multi-objective Optimization Algorithm Based on Reinforcement Learning
    Liu, Jun
    Zhou, Yi
    Qiu, Yimin
    Li, Zhongfeng
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2022, PT I, 2022, : 501 - 513
  • [3] Scalarized Multi-Objective Reinforcement Learning: Novel Design Techniques
    Van Moffaert, Kristof
    Drugan, Madalina M.
    Nowe, Ann
    PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2013, : 191 - 199
  • [4] A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation
    Yang, Runzhe
    Sun, Xingyuan
    Narasimhan, Karthik
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] Reinforcement Learning-Based Hybrid Multi-Objective Optimization Algorithm Design
    Palm, Herbert
    Arndt, Lorin
    INFORMATION, 2023, 14 (05)
  • [6] A multi-objective optimization algorithm based on gradient information
    Qi, Rongbin
    Liu, Chenxia
    Zhong, Weimin
    Qian, Feng
    Huagong Xuebao/CIESC Journal, 2013, 64 (12): : 4401 - 4409
  • [7] Decomposition based Multi-Objective Evolutionary Algorithm in XCS for Multi-Objective Reinforcement Learning
    Cheng, Xiu
    Browne, Will N.
    Zhang, Mengjie
    2018 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2018, : 622 - 629
  • [8] Multimodal Scalarized Preferences in Multi-objective Optimization
    Braun, Marlon
    Heling, Lars
    Shukla, Pradyumn
    Schmeck, Hartmut
    PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'17), 2017, : 545 - 552
  • [9] Safety Optimized Reinforcement Learning via Multi-Objective Policy Optimization
    Honari, Homayoun
    Tamizi, Mehran Ghafarian
    Najjaran, Homayoun
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 2873 - 2879
  • [10] Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning
    Kanazawa, Takuya
    Gupta, Chetan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 63 - 76