Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning

被引：4

作者：

Kim, Man-Je ^{[1
]}

Park, Hyunsoo ^{[2
]}

Ahn, Chang Wook ^{[1
]}

机构：

[1] Gwangju Inst Sci & Technol, AI Grad Sch, Gwangju 61005, South Korea

[2] NCSOFT, Seongnam Si 13494, South Korea

来源：

ELECTRONICS | 2022年 / 11卷 / 07期

基金：

新加坡国家研究基金会;

关键词：

reinforcement learning; multi-objective optimization; real-time environment;

D O I：

10.3390/electronics11071069

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Control intelligence is a typical field where there is a trade-off between target objectives, and researchers in this field have longed for artificial intelligence that achieves the target objectives. Multi-objective deep reinforcement learning was sufficient to satisfy this need. In particular, multi-objective deep reinforcement learning methods based on policy optimization are leading the optimization of control intelligence. However, multi-objective reinforcement learning has difficulties when finding various Pareto optimals of multi-objectives due to the greedy nature of reinforcement learning. We propose a method of policy assimilation to solve this problem. This method was applied to MO-V-MPO, one of preference-based multi-objective reinforcement learning, to increase diversity. The performance of this method has been verified through experiments in a continuous control environment.

引用

页数：8

共 28 条

[1]

Abdolmaleki Abbas, 2018, 6 INT C LEARNING REP

[2]

[Anonymous], Policies in Multi-Objective (Deep) Reinforcement Learning

[3]

[Anonymous], 2020, PR MACH LEARN RES

[4] A comprehensive survey of multiagent reinforcement learning [J].

Busoniu, Lucian ;

Babuska, Robert ;

De Schutter, Bart .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172

[5]

Espeholt L, 2018, PR MACH LEARN RES, V80

[6]

Kapturowski S., 2019, P INT C LEARN REPR N

[7] Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework [J].

Khamis, Mohamed A. ;

Gomaa, Walid .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2014, 29 :134-151

[8] Genetic state-grouping algorithm for deep reinforcement learning [J].

Kim, Man-Je ;

Kim, Jun Suk ;

Kim, Sungjin James ;

Kim, Min-jung ;

Ahn, Chang Wook .

EXPERT SYSTEMS WITH APPLICATIONS, 2020, 161

[9] Neural Basis of Reinforcement Learning and Decision Making [J].

Lee, Daeyeol ;

Seo, Hyojung ;

Jung, Min Whan .

ANNUAL REVIEW OF NEUROSCIENCE, VOL 35, 2012, 35 :287-308

[10]

Lillicrap TP., 2015, ARXIV

← 1 2 3 →