Joint Optimization of Concave Scalarized Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

被引：0

作者：

Bai, Qinbo ^{[1
]}

Agarwal, Mridul ^{[1
]}

Aggarwal, Vaneet ^{[1
]}

机构：

[1] Purdue Univ, W Lafayette, IN 47907 USA

来源：

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2022年 / 74卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, an asymptotically biased estimator is proposed. The proposed algorithm is shown to achieve convergence to within an epsilon of the global optima after sampling O(M-4 sigma(2)/(1-gamma)(8)epsilon(4)) trajectories where gamma is the discount factor and M is the number of the agents, thus achieving the same dependence on epsilon as the policy gradient algorithm for the standard reinforcement learning.

引用

页码：1565 / 1597

页数：33

共 50 条

[31] Multi-objective optimization using teaching-learning-based optimization algorithm
Zou, Feng
Wang, Lei
Hei, Xinhong
Chen, Debao
Wang, Bin
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (04) : 1291 - 1300
[32] Multi-objective Reinforcement Learning with Path Integral Policy Improvement
Ariizumi, Ryo
Sago, Hayato
Asai, Toru
Azuma, Shun-ichi
2023 62ND ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS, SICE, 2023, : 1418 - 1423
[33] Neuroevolutionary diversity policy search for multi-objective reinforcement learning
Zhou, Dan
Du, Jiqing
Arai, Sachiyo
INFORMATION SCIENCES, 2024, 657
[34] An Improved Multi-Objective Deep Reinforcement Learning Algorithm Based on Envelope Update
Hu, Can
Zhu, Zhengwei
Wang, Lijia
Zhu, Chenyang
Yang, Yanfei
ELECTRONICS, 2022, 11 (16)
[35] Superconducting quantum computing optimization based on multi-objective deep reinforcement learning
Liu, Yangting
SCIENTIFIC REPORTS, 2025, 15 (01):
[36] Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning
Chen, Diqi
Wang, Yizhou
Gao, Wen
APPLIED INTELLIGENCE, 2020, 50 (10) : 3301 - 3317
[37] A reinforcement learning-based multi-objective optimization in an interval and dynamic environment
Xu, Yue
Song, Yuxuan
Pi, Dechang
Chen, Yang
Qin, Shuo
Zhang, Xiaoge
Yang, Shengxiang
KNOWLEDGE-BASED SYSTEMS, 2023, 280
[38] Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning
Diqi Chen
Yizhou Wang
Wen Gao
Applied Intelligence, 2020, 50 : 3301 - 3317
[39] A Multi-objective Generalized Teacher-Learning-Based-Optimization Algorithm
Ram S.D.K.
Srivastava S.
Mishra K.K.
Journal of The Institution of Engineers (India): Series B, 2022, 103 (5) : 1415 - 1430
[40] A Multi-objective Generalized Teacher-Learning-Based-Optimization Algorithm
Ram, Satya Deo Kumar
Srivastava, Shashank
Mishra, K.K.
Journal of The Institution of Engineers (India): Series B, 2022, 103 (05) : 1415 - 1430

← 1 2 3 4 5 →