Joint Optimization of Concave Scalarized Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

被引:0
|
作者
Bai, Qinbo [1 ]
Agarwal, Mridul [1 ]
Aggarwal, Vaneet [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
来源
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2022年 / 74卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, an asymptotically biased estimator is proposed. The proposed algorithm is shown to achieve convergence to within an epsilon of the global optima after sampling O(M-4 sigma(2)/(1-gamma)(8)epsilon(4)) trajectories where gamma is the discount factor and M is the number of the agents, thus achieving the same dependence on epsilon as the policy gradient algorithm for the standard reinforcement learning.
引用
收藏
页码:1565 / 1597
页数:33
相关论文
共 50 条
  • [21] A Dynamic Multi-objective Scheduling Approach for Gradient-Based Reinforcement Learning
    Hengel, Katharina
    Wagner, Achim
    Ruskowski, Martin
    IFAC PAPERSONLINE, 2024, 58 (19): : 49 - 54
  • [22] A Multi-objective Reinforcement Learning Algorithm for JS']JSSP
    Mendez-Hernandez, Beatriz M.
    Rodriguez-Bazan, Erick D.
    Martinez-Jimenez, Yailen
    Libin, Pieter
    Nowe, Ann
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I, 2019, 11727 : 567 - 584
  • [23] Predicting optimal value functions by interpolating reward functions in scalarized multi-objective reinforcement learning
    Kusari, Arpan
    How, Jonathan P.
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 7484 - 7490
  • [24] Multi-objective multicast optimization with deep reinforcement learning
    Li, Xiaole
    Tian, Jinwei
    Wang, Cuiping
    Jiang, Yinghui
    Wang, Xing
    Wang, Jiuru
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2025, 28 (04):
  • [25] Multi-Objective Optimization in Disaster Backup with Reinforcement Learning
    Yi, Shanwen
    Qin, Yao
    Wang, Hua
    MATHEMATICS, 2025, 13 (03)
  • [26] A reinforcement learning approach for dynamic multi-objective optimization
    Zou, Fei
    Yen, Gary G.
    Tang, Lixin
    Wang, Chunfeng
    INFORMATION SCIENCES, 2021, 546 : 815 - 834
  • [27] A Reinforcement Learning based evolutionary multi-objective optimization algorithm for spectrum allocation in Cognitive Radio networks
    Kaur, Amandeep
    Kumar, Krishan
    PHYSICAL COMMUNICATION, 2020, 43
  • [28] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
    Horie, Naoto
    Matsui, Tohgoroh
    Moriyama, Koichi
    Mutoh, Atsuko
    Inuzuka, Nobuhiro
    ARTIFICIAL LIFE AND ROBOTICS, 2019, 24 (03) : 352 - 359
  • [29] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
    Naoto Horie
    Tohgoroh Matsui
    Koichi Moriyama
    Atsuko Mutoh
    Nobuhiro Inuzuka
    Artificial Life and Robotics, 2019, 24 : 352 - 359
  • [30] Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
    Kim, Man-Je
    Park, Hyunsoo
    Ahn, Chang Wook
    ELECTRONICS, 2022, 11 (07)