Joint Optimization of Concave Scalarized Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

被引：0

作者：

Bai, Qinbo ^{[1
]}

Agarwal, Mridul ^{[1
]}

Aggarwal, Vaneet ^{[1
]}

机构：

[1] Purdue Univ, W Lafayette, IN 47907 USA

来源：

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2022年 / 74卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, an asymptotically biased estimator is proposed. The proposed algorithm is shown to achieve convergence to within an epsilon of the global optima after sampling O(M-4 sigma(2)/(1-gamma)(8)epsilon(4)) trajectories where gamma is the discount factor and M is the number of the agents, thus achieving the same dependence on epsilon as the policy gradient algorithm for the standard reinforcement learning.

引用

页码：1565 / 1597

页数：33

共 50 条

[21] A Dynamic Multi-objective Scheduling Approach for Gradient-Based Reinforcement Learning
Hengel, Katharina
Wagner, Achim
Ruskowski, Martin
IFAC PAPERSONLINE, 2024, 58 (19): : 49 - 54
[22] A Multi-objective Reinforcement Learning Algorithm for JS']JSSP
Mendez-Hernandez, Beatriz M.
Rodriguez-Bazan, Erick D.
Martinez-Jimenez, Yailen
Libin, Pieter
Nowe, Ann
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I, 2019, 11727 : 567 - 584
[23] Predicting optimal value functions by interpolating reward functions in scalarized multi-objective reinforcement learning
Kusari, Arpan
How, Jonathan P.
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 7484 - 7490
[24] Multi-objective multicast optimization with deep reinforcement learning
Li, Xiaole
Tian, Jinwei
Wang, Cuiping
Jiang, Yinghui
Wang, Xing
Wang, Jiuru
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2025, 28 (04):
[25] Multi-Objective Optimization in Disaster Backup with Reinforcement Learning
Yi, Shanwen
Qin, Yao
Wang, Hua
MATHEMATICS, 2025, 13 (03)
[26] A reinforcement learning approach for dynamic multi-objective optimization
Zou, Fei
Yen, Gary G.
Tang, Lixin
Wang, Chunfeng
INFORMATION SCIENCES, 2021, 546 : 815 - 834
[27] A Reinforcement Learning based evolutionary multi-objective optimization algorithm for spectrum allocation in Cognitive Radio networks
Kaur, Amandeep
Kumar, Krishan
PHYSICAL COMMUNICATION, 2020, 43
[28] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
Horie, Naoto
Matsui, Tohgoroh
Moriyama, Koichi
Mutoh, Atsuko
Inuzuka, Nobuhiro
ARTIFICIAL LIFE AND ROBOTICS, 2019, 24 (03) : 352 - 359
[29] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
Naoto Horie
Tohgoroh Matsui
Koichi Moriyama
Atsuko Mutoh
Nobuhiro Inuzuka
Artificial Life and Robotics, 2019, 24 : 352 - 359
[30] Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
Kim, Man-Je
Park, Hyunsoo
Ahn, Chang Wook
ELECTRONICS, 2022, 11 (07)

← 1 2 3 4 5 →