RSAC: A Robust Deep Reinforcement Learning Strategy for Dimensionality Perturbation

被引：2

作者：

Gupta, Surbhi ^{[1
]}

Singal, Gaurav ^{[2
]}

Garg, Deepak ^{[1
]}

Das, Swagatam ^{[3
]}

机构：

[1] Bennett Univ, Greater Noida 201310, Uttar Pradesh, India

[2] Netaji Subhas Univ Technol, New Delhi 110078, India

[3] ISI, Kolkata 700108, W Bengal, India

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2022年 / 6卷 / 05期

关键词：

Sensors; Perturbation methods; Robustness; Noise measurement; Training; Robot sensing systems; Sensor systems; DRL; sensor; perturbation; robust; locomotion; actor-critic; OpenAI gym; FAULT; ROBOT;

D O I：

10.1109/TETCI.2022.3157003

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Artificial agents are used in autonomous systems such as autonomous vehicles, autonomous robotics, and autonomous drones to make predictions based on data generated by fusing the values from many sources such as different sensors. Malfunctioning of sensors was noticed in the robotics domain. The correct observation from sensors corresponds to the true estimate of the dimension value of the state vector in deep reinforcement learning (DRL). Hence, noisy estimates from these sensors lead to dimensionality impairment in the state. DRL policies have shown to stagger its decision by the wrong choice of action in case of adversarial attack or modeling error. Hence, it is necessary to examine the effect of dimensionality perturbation on neural policy. In this regard, we analyze whether subtle dimensionality perturbation that occurs due to the noise in the source of input at the testing time distracts agent decisions. Also, we propose an RSAC (robust soft actor-critic) approach that uses a noisy state for prediction, and estimates target from nominal observation. We find that the injection of such noisy input during training will not hamper learning. We have done our simulation in the OpenAI gym MuJoCo (Walker2d-V2) environment and our empirical results demonstrate that the proposed approach competes for SAC's performance and makes it robust to test time dimensionality perturbation.

引用

页码：1157 / 1166

页数：10

共 30 条

[1] Abdullah Mohammed Amin, 2019, ARXIV190713196
[2] Obtaining fault tolerance avoidance behavior using deep reinforcement learning
Aznar, Fidel
Pujol, Mar
Rizo, Ramon
[J]. NEUROCOMPUTING, 2019, 345 : 77 - 91
[3] Multiplicative noise is beneficial for the transmission of sensory signals in simple neuron models
Bauermann, Jonathan
Lindner, Benjamin
[J]. BIOSYSTEMS, 2019, 178 : 25 - 31
[4] Berner C., ARXIV191206680
[5] Doran M., 2017, P ICACCE 19 INT C AU, P621
[6] Fault accommodation in compliant quadruped robot through a moving appendage mechanism
Gor, M. M.
Pathak, P. M.
Samantaray, A. K.
Yang, J. -M.
Kwak, S. W.
[J]. MECHANISM AND MACHINE THEORY, 2018, 121 : 228 - 244
[7] Precise Control for Deep Driving using Dual Critic based DRL Approaches
Gupta, Surbhi
Singal, Gaurav
Garg, Deepak
[J]. 2021 IEEE INTELLIGENT VEHICLES SYMPOSIUM WORKSHOPS (IV WORKSHOPS), 2021, : 76 - 82
[8] QC_SANE: Robust Control in DRL Using Quantile Critic With Spiking Actor and Normalized Ensemble
Gupta, Surbhi
Singal, Gaurav
Garg, Deepak
Jagannathan, Sarangapani
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 6656 - 6662
[9] Deep Reinforcement Learning Techniques in Diversified Domains: A Survey
Gupta, Surbhi
Singal, Gaurav
Garg, Deepak
[J]. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2021, 28 (07) : 4715 - 4754
[10] James G., 2019, BOEING 737 MAX CRASH

← 1 2 3 →