Meta attention for Off-Policy Actor-Critic

被引:6
作者
Huang, Jiateng [1 ,2 ]
Huang, Wanrong [1 ,2 ]
Lan, Long [1 ,2 ]
Wu, Dan [3 ,4 ]
机构
[1] Natl Univ Def Technol, Inst Quantum Informat, Coll Comp Sci & Technol, Changsha 410073, Hunan, Peoples R China
[2] Natl Univ Def Technol, State Key Lab High Performance Comp, Changsha 410073, Hunan, Peoples R China
[3] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China
[4] Natl Univ Def Technol, Hefei Interdisciplinary Ctr, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Reinforcement learning; Meta learning; Attention mechanism; Actor-Critic methods;
D O I
10.1016/j.neunet.2023.03.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Off-Policy Actor-Critic methods can effectively exploit past experiences and thus they have achieved great success in various reinforcement learning tasks. In many image-based and multi-agent tasks, attention mechanism has been employed in Actor-Critic methods to improve their sampling efficiency. In this paper, we propose a meta attention method for state-based reinforcement learning tasks, which combines attention mechanism and meta-learning based on the Off-Policy Actor-Critic framework. Unlike previous attention-based work, our meta attention method introduces attention in the Actor and the Critic of the typical Actor-Critic framework, rather than in multiple pixels of an image or multiple information sources in specific image-based control tasks or multi-agent systems. In contrast to existing meta-learning methods, the proposed meta-attention approach is able to function in both the gradient-based training phase and the agent's decision-making process. The experimental results demonstrate the superiority of our meta-attention method in various continuous control tasks, which are based on the Off-Policy Actor-Critic methods including DDPG and TD3.(c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页码:86 / 96
页数:11
相关论文
共 48 条
[1]  
[Anonymous], 2000, Neurocomputing, DOI DOI 10.1016/S0925-2312(00)00324-6
[2]  
[Anonymous], 2021, arXiv
[3]  
Aytar Y, 2018, ADV NEUR IN, V31
[4]  
Barati E, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2002
[5]   Meta Learning via Learned Loss [J].
Bechtle, Sarah ;
Molchanov, Artem ;
Chebotar, Yevgen ;
Grefenstette, Edward ;
Righetti, Ludovic ;
Sukhatme, Gaurav ;
Meier, Franziska .
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :4161-4168
[6]  
Stadie BC, 2019, Arxiv, DOI arXiv:1803.01118
[7]   GPDS: A multi-agent deep reinforcement learning game for anti-jamming secure computing in MEC network [J].
Chen, Miaojiang ;
Liu, Wei ;
Zhang, Ning ;
Li, Junling ;
Ren, Yingying ;
Yi, Meng ;
Liu, Anfeng .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 210
[8]   RDRL: A Recurrent Deep Reinforcement Learning Scheme for Dynamic Spectrum Access in Reconfigurable Wireless Networks [J].
Chen, Miaojiang ;
Liu, Anfeng ;
Liu, Wei ;
Ota, Kaoru ;
Dong, Mianxiong ;
Xiong, N. Neal .
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (02) :364-376
[9]   A game-based deep reinforcement learning approach for energy-efficient computation in MEC systems [J].
Chen, Miaojiang ;
Liu, Wei ;
Wang, Tian ;
Zhang, Shaobo ;
Liu, Anfeng .
KNOWLEDGE-BASED SYSTEMS, 2022, 235
[10]  
Chen YL, 2019, IEEE INT C INT ROBOT, P3697, DOI [10.1109/cvprw.2019.00172, 10.1109/IROS40897.2019.8968565]