Behavior fusion for deep reinforcement learning

被引:6
作者
Shi, Haobin [1 ]
Xu, Meng [1 ]
Hwang, Kao-Shing [2 ,3 ]
Cai, Bo-Yin [2 ,3 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Natl Sun Yat Sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan
[3] Kaohsiung Med Univ, Dept Healthcare Adm & Med Informat, Kaohsiung 80708, Taiwan
基金
中国国家自然科学基金;
关键词
Deep reinforcement learning; Actor-critic; Policy gradient; Behavior fusion; Complex task; DECISION-MAKING; ENVIRONMENT; NAVIGATION; GRADIENT; NETWORK;
D O I
10.1016/j.isatra.2019.08.054
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For deep reinforcement learning (DRL) system, it is difficult to design a reward function for complex tasks, so this paper proposes a framework of behavior fusion for the actor-critic architecture, which learns the policy based on an advantage function that consists of two value functions. Firstly, the proposed method decomposes a complex task into several sub-tasks, and merges the trained policies for those sub-tasks into a unified policy for the complex task, instead of designing a new reward function and training for the policy. Each sub-task is trained individually by an actor-critic algorithm using a simple reward function. These pre-trained sub-tasks are building blocks that are used to rapidly assemble a rapid prototype of a complicated task. Secondly, the proposed method integrates modules in the calculation of the policy gradient by calculating the accumulated returns to reduce variation. Thirdly, two alternative methods to acquire integrated returns for the complicated task are also proposed. The Atari 2600 pong game and a wafer probe task are used to validate the performance of the proposed methods by comparison with the method using a gate network. (C) 2019 ISA. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:434 / 444
页数:11
相关论文
共 50 条
  • [21] A survey on deep reinforcement learning approaches for traffic signal control
    Zhao, Haiyan
    Dong, Chengcheng
    Cao, Jian
    Chen, Qingkui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [22] From Reinforcement Learning to Deep Reinforcement Learning: An Overview
    Agostinelli, Forest
    Hocquet, Guillaume
    Singh, Sameer
    Baldi, Pierre
    BRAVERMAN READINGS IN MACHINE LEARNING: KEY IDEAS FROM INCEPTION TO CURRENT STATE, 2018, 11100 : 298 - 328
  • [23] Data Fusion-Link Prediction for Evolutionary Network with Deep Reinforcement Learning
    Lim, Marcus
    Abdullah, Azween
    Jhanjhi, N. Z.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (06) : 335 - 342
  • [24] Autonomous Navigation by Mobile Robot with Sensor Fusion Based on Deep Reinforcement Learning
    Ou, Yang
    Cai, Yiyi
    Sun, Youming
    Qin, Tuanfa
    SENSORS, 2024, 24 (12)
  • [25] Deep Reinforcement Learning Model for Stock Portfolio Management Based on Data Fusion
    Haifeng Li
    Mo Hai
    Neural Processing Letters, 56
  • [26] Bayesian controller fusion: Leveraging control priors in deep reinforcement learning for robotics
    Rana, Krishan
    Dasagi, Vibhavari
    Haviland, Jesse
    Talbot, Ben
    Milford, Michael
    Sunderhauf, Niko
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2023, 42 (03) : 123 - 146
  • [27] Thermal control of laser powder bed fusion using deep reinforcement learning
    Ogoke, Francis
    Farimani, Amir Barati
    ADDITIVE MANUFACTURING, 2021, 46
  • [28] Deep Reinforcement Learning Model for Stock Portfolio Management Based on Data Fusion
    Li, Haifeng
    Hai, Mo
    NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [29] Deep Inverse Reinforcement Learning for Behavior Prediction in Autonomous Driving: Accurate Forecasts of Vehicle Motion
    Fernando, Tharindu
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    IEEE SIGNAL PROCESSING MAGAZINE, 2021, 38 (01) : 87 - 96
  • [30] Obtaining fault tolerance avoidance behavior using deep reinforcement learning
    Aznar, Fidel
    Pujol, Mar
    Rizo, Ramon
    NEUROCOMPUTING, 2019, 345 : 77 - 91