Joint Optimization of Concave Scalarized Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

被引:0
|
作者
Bai, Qinbo [1 ]
Agarwal, Mridul [1 ]
Aggarwal, Vaneet [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
来源
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2022年 / 74卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, an asymptotically biased estimator is proposed. The proposed algorithm is shown to achieve convergence to within an epsilon of the global optima after sampling O(M-4 sigma(2)/(1-gamma)(8)epsilon(4)) trajectories where gamma is the discount factor and M is the number of the agents, thus achieving the same dependence on epsilon as the policy gradient algorithm for the standard reinforcement learning.
引用
收藏
页码:1565 / 1597
页数:33
相关论文
共 50 条
  • [31] Multi-objective optimization using teaching-learning-based optimization algorithm
    Zou, Feng
    Wang, Lei
    Hei, Xinhong
    Chen, Debao
    Wang, Bin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (04) : 1291 - 1300
  • [32] Multi-objective Reinforcement Learning with Path Integral Policy Improvement
    Ariizumi, Ryo
    Sago, Hayato
    Asai, Toru
    Azuma, Shun-ichi
    2023 62ND ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS, SICE, 2023, : 1418 - 1423
  • [33] Neuroevolutionary diversity policy search for multi-objective reinforcement learning
    Zhou, Dan
    Du, Jiqing
    Arai, Sachiyo
    INFORMATION SCIENCES, 2024, 657
  • [34] An Improved Multi-Objective Deep Reinforcement Learning Algorithm Based on Envelope Update
    Hu, Can
    Zhu, Zhengwei
    Wang, Lijia
    Zhu, Chenyang
    Yang, Yanfei
    ELECTRONICS, 2022, 11 (16)
  • [35] Superconducting quantum computing optimization based on multi-objective deep reinforcement learning
    Liu, Yangting
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [36] Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning
    Chen, Diqi
    Wang, Yizhou
    Gao, Wen
    APPLIED INTELLIGENCE, 2020, 50 (10) : 3301 - 3317
  • [37] A reinforcement learning-based multi-objective optimization in an interval and dynamic environment
    Xu, Yue
    Song, Yuxuan
    Pi, Dechang
    Chen, Yang
    Qin, Shuo
    Zhang, Xiaoge
    Yang, Shengxiang
    KNOWLEDGE-BASED SYSTEMS, 2023, 280
  • [38] Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning
    Diqi Chen
    Yizhou Wang
    Wen Gao
    Applied Intelligence, 2020, 50 : 3301 - 3317
  • [39] A Multi-objective Generalized Teacher-Learning-Based-Optimization Algorithm
    Ram S.D.K.
    Srivastava S.
    Mishra K.K.
    Journal of The Institution of Engineers (India): Series B, 2022, 103 (5) : 1415 - 1430
  • [40] A Multi-objective Generalized Teacher-Learning-Based-Optimization Algorithm
    Ram, Satya Deo Kumar
    Srivastava, Shashank
    Mishra, K.K.
    Journal of The Institution of Engineers (India): Series B, 2022, 103 (05) : 1415 - 1430