Action space noise optimization as exploration in deterministic policy gradient for locomotion tasks

被引：0

作者：

Hesan Nobakht

Yong Liu

机构：

[1] Nanjing University of Science and Technology,School of Computer Science

来源：

Applied Intelligence | 2022年 / 52卷

关键词：

Deep reinforcement learning; Exploration; Action noise; Model of dynamics; Locomotion;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Reinforcement learning (RL) algorithms with deterministic actors (policy) commonly apply noise to the action space for exploration. These exploration methods are either undirected or require extra knowledge of the environment. In the aim of addressing these fundamental limitations, this paper introduces a parameterized stochastic action-noise policy (as a probability distribution) that correlates with the objectivity of the RL algorithm. This policy is optimized based on state-action values of predicted future states. Consequently, the optimization does not rely on the explicit definition of the reward function which improves the adaptability of this exploration strategy for different environments and algorithms. Moreover, this paper presents a predictive model of system dynamics (transitional probability) with the capacity to capture the uncertainty of the environments with optimal design and fewer parameters. It significantly reduces the model complexity while maintaining the same level of accuracy as current methods. This research evaluates and analyzes the proposed method and models while demonstrating significant increase in performance and reliability across various locomotion and control tasks in comparison with current methods.

引用

页码：14218 / 14232

页数：14

共 38 条

[1]

Levine S(2016)End-to-end training of deep visuomotor policies J Mach Learn Res 17 1334-1373

[2]

Finn C(2021)Natural object manipulation using anthropomorphic robotic hand through deep reinforcement learning and deep grasping probability network Appl Intell 51 1041-1055

[3]

Darrell T(1996)Reinforcement learning: a survey J Artif Intell Res 4 237-285

[4]

Abbeel P(1930)On the theory of the brownian motion Phys Rev 36 823-841

[5]

Añazco EV(2015)Control policy with autocorrelated noise in reinforcement learning for robotics Int J Mach Learn Comput 5 91-95

[6]

Lopez PR(2021)Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods Mach Learn 110 457-506

[7]

Park N(2015)Human-level control through deep reinforcement learning Nature 518 529-533

[8]

Oh J(undefined)undefined undefined undefined undefined-undefined

[9]

Ryu G(undefined)undefined undefined undefined undefined-undefined

[10]

Al-antari MA(undefined)undefined undefined undefined undefined-undefined

← 1 2 3 4 →