Averaged Soft Actor-Critic for Deep Reinforcement Learning

被引：15

作者：

Ding, Feng ^{[1
]}

Ma, Guanfeng ^{[1
]}

Chen, Zhikui ^{[1
]}

Gao, Jing ^{[1
]}

Li, Peng ^{[1
]}

机构：

[1] Dalian Univ Technol, Sch Software Technol, Dalian, Peoples R China

来源：

COMPLEXITY | 2021年 / 2021卷

基金：

中国国家自然科学基金;

关键词：

Deep learning;

D O I：

10.1155/2021/6658724

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged-SAC. By averaging the previously learned action-state estimates, it reduces the overestimation problem of soft Q-learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged-SAC through some games in the MuJoCo environment. The experimental results show that the Averaged-SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process.

引用

页数：16

共 21 条

[1]

Anschel O, 2017, 34 INT C MACHINE LEA, V70

[2] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].

BARTO, AG ;

SUTTON, RS ;

ANDERSON, CW .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846

[3] Hierarchical reinforcement learning for self-driving decision-making without reliance on labelled driving data [J].

Duan, Jingliang ;

Eben Li, Shengbo ;

Guan, Yang ;

Sun, Qi ;

Cheng, Bo .

IET INTELLIGENT TRANSPORT SYSTEMS, 2020, 14 (05) :297-305

[4]

Haarnoja T., 2018, Soft actor-critic algorithms and applications

[5]

Haarnoja T, 2018, PR MACH LEARN RES, V80

[6]

Hasselt H., 2010, Proceedings of the Advances in Neural Information Processing Systems, V23, P2613

[7]

Heess N, 2015, ADV NEUR IN, V28

[8]

Henderson P, 2018, AAAI CONF ARTIF INTE, P3207

[9]

Kingma D., 2015, P INT C LEARN PRES I

[10] On actor-critic algorithms [J].

Konda, VR ;

Tsitsiklis, JN .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) :1143-1166

← 1 2 3 →