Behavior fusion for deep reinforcement learning

被引：6

作者：

Shi, Haobin ^{[1
]}

Xu, Meng ^{[1
]}

Hwang, Kao-Shing ^{[2
,3
]}

Cai, Bo-Yin ^{[2
,3
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China

[2] Natl Sun Yat Sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan

[3] Kaohsiung Med Univ, Dept Healthcare Adm & Med Informat, Kaohsiung 80708, Taiwan

来源：

ISA TRANSACTIONS | 2020年 / 98卷

基金：

中国国家自然科学基金;

关键词：

Deep reinforcement learning; Actor-critic; Policy gradient; Behavior fusion; Complex task; DECISION-MAKING; ENVIRONMENT; NAVIGATION; GRADIENT; NETWORK;

D O I：

10.1016/j.isatra.2019.08.054

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

For deep reinforcement learning (DRL) system, it is difficult to design a reward function for complex tasks, so this paper proposes a framework of behavior fusion for the actor-critic architecture, which learns the policy based on an advantage function that consists of two value functions. Firstly, the proposed method decomposes a complex task into several sub-tasks, and merges the trained policies for those sub-tasks into a unified policy for the complex task, instead of designing a new reward function and training for the policy. Each sub-task is trained individually by an actor-critic algorithm using a simple reward function. These pre-trained sub-tasks are building blocks that are used to rapidly assemble a rapid prototype of a complicated task. Secondly, the proposed method integrates modules in the calculation of the policy gradient by calculating the accumulated returns to reduce variation. Thirdly, two alternative methods to acquire integrated returns for the complicated task are also proposed. The Atari 2600 pong game and a wafer probe task are used to validate the performance of the proposed methods by comparison with the method using a gate network. (C) 2019 ISA. Published by Elsevier Ltd. All rights reserved.

引用

页码：434 / 444

页数：11

共 50 条

[1] Deep reinforcement learning in loop fusion problem
Ziraksima, Mahsa
Lotfi, Shahriar
Razmara, Jafar
NEUROCOMPUTING, 2022, 481 : 102 - 120
[2] Multifeature Fusion Human Motion Behavior Recognition Algorithm Using Deep Reinforcement Learning
Lu, Chengkun
MOBILE INFORMATION SYSTEMS, 2021, 2021
[3] A Deep Reinforcement Learning Method For Multimodal Data Fusion in Action Recognition
Guo, Jiale
Liu, Qiang
Chen, Enqing
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 120 - 124
[4] Overview of Deep Reinforcement Learning Improvements and Applications
Zhang, Junjie
Zhang, Cong
Chien, Wei-Che
JOURNAL OF INTERNET TECHNOLOGY, 2021, 22 (02): : 239 - 255
[5] A behavior fusion method based on inverse reinforcement learning
Shi, Haobin
Li, Jingchen
Chen, Shicong
Hwang, Kao-Shing
INFORMATION SCIENCES, 2022, 609 : 429 - 444
[6] Feature Fusion Deep Reinforcement Learning Approach for Stock Trading
Bai, Tongyuan
Lang, Qi
Song, Shifan
Fang, Yan
Liu, Xiaodong
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 7240 - 7245
[7] Multi-feature Fusion for Deep Reinforcement Learning: Sequential Control of Mobile Robots
Wang, Haotian
Yang, Wenjing
Huang, Wanrong
Lin, Zhipeng
Tang, Yuhua
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT VII, 2018, 11307 : 303 - 315
[8] A Heterogeneous Information Fusion Deep Reinforcement Learning for Intelligent Frequency Selection of HF Communication
Liu, Xin
Xu, Yuhua
Cheng, Yunpeng
Li, Yangyang
Zhao, Lei
Zhang, Xiaobo
CHINA COMMUNICATIONS, 2018, 15 (09) : 73 - 84
[9] DGTRL: Deep graph transfer reinforcement learning method based on fusion of knowledge and data
Chen, Genxin
Qi, Jin
Gao, Yu
Zhu, Xingjian
Dong, Zhenjiang
Sun, Yanfei
INFORMATION SCIENCES, 2024, 658
[10] Deep Reinforcement Learning: A Survey
Wang, Xu
Wang, Sen
Liang, Xingxing
Zhao, Dawei
Huang, Jincai
Xu, Xin
Dai, Bin
Miao, Qiguang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5064 - 5078

← 1 2 3 4 5 →