Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning

被引:4
作者
Kim, Man-Je [1 ]
Park, Hyunsoo [2 ]
Ahn, Chang Wook [1 ]
机构
[1] Gwangju Inst Sci & Technol, AI Grad Sch, Gwangju 61005, South Korea
[2] NCSOFT, Seongnam Si 13494, South Korea
基金
新加坡国家研究基金会;
关键词
reinforcement learning; multi-objective optimization; real-time environment;
D O I
10.3390/electronics11071069
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Control intelligence is a typical field where there is a trade-off between target objectives, and researchers in this field have longed for artificial intelligence that achieves the target objectives. Multi-objective deep reinforcement learning was sufficient to satisfy this need. In particular, multi-objective deep reinforcement learning methods based on policy optimization are leading the optimization of control intelligence. However, multi-objective reinforcement learning has difficulties when finding various Pareto optimals of multi-objectives due to the greedy nature of reinforcement learning. We propose a method of policy assimilation to solve this problem. This method was applied to MO-V-MPO, one of preference-based multi-objective reinforcement learning, to increase diversity. The performance of this method has been verified through experiments in a continuous control environment.
引用
收藏
页数:8
相关论文
共 28 条
[1]  
Abdolmaleki Abbas, 2018, 6 INT C LEARNING REP
[2]  
[Anonymous], Policies in Multi-Objective (Deep) Reinforcement Learning
[3]  
[Anonymous], 2020, PR MACH LEARN RES
[4]   A comprehensive survey of multiagent reinforcement learning [J].
Busoniu, Lucian ;
Babuska, Robert ;
De Schutter, Bart .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172
[5]  
Espeholt L, 2018, PR MACH LEARN RES, V80
[6]  
Kapturowski S., 2019, P INT C LEARN REPR N
[7]   Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework [J].
Khamis, Mohamed A. ;
Gomaa, Walid .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2014, 29 :134-151
[8]   Genetic state-grouping algorithm for deep reinforcement learning [J].
Kim, Man-Je ;
Kim, Jun Suk ;
Kim, Sungjin James ;
Kim, Min-jung ;
Ahn, Chang Wook .
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 161
[9]   Neural Basis of Reinforcement Learning and Decision Making [J].
Lee, Daeyeol ;
Seo, Hyojung ;
Jung, Min Whan .
ANNUAL REVIEW OF NEUROSCIENCE, VOL 35, 2012, 35 :287-308
[10]  
Lillicrap TP., 2015, ARXIV