RSAC: A Robust Deep Reinforcement Learning Strategy for Dimensionality Perturbation

被引:2
作者
Gupta, Surbhi [1 ]
Singal, Gaurav [2 ]
Garg, Deepak [1 ]
Das, Swagatam [3 ]
机构
[1] Bennett Univ, Greater Noida 201310, Uttar Pradesh, India
[2] Netaji Subhas Univ Technol, New Delhi 110078, India
[3] ISI, Kolkata 700108, W Bengal, India
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2022年 / 6卷 / 05期
关键词
Sensors; Perturbation methods; Robustness; Noise measurement; Training; Robot sensing systems; Sensor systems; DRL; sensor; perturbation; robust; locomotion; actor-critic; OpenAI gym; FAULT; ROBOT;
D O I
10.1109/TETCI.2022.3157003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Artificial agents are used in autonomous systems such as autonomous vehicles, autonomous robotics, and autonomous drones to make predictions based on data generated by fusing the values from many sources such as different sensors. Malfunctioning of sensors was noticed in the robotics domain. The correct observation from sensors corresponds to the true estimate of the dimension value of the state vector in deep reinforcement learning (DRL). Hence, noisy estimates from these sensors lead to dimensionality impairment in the state. DRL policies have shown to stagger its decision by the wrong choice of action in case of adversarial attack or modeling error. Hence, it is necessary to examine the effect of dimensionality perturbation on neural policy. In this regard, we analyze whether subtle dimensionality perturbation that occurs due to the noise in the source of input at the testing time distracts agent decisions. Also, we propose an RSAC (robust soft actor-critic) approach that uses a noisy state for prediction, and estimates target from nominal observation. We find that the injection of such noisy input during training will not hamper learning. We have done our simulation in the OpenAI gym MuJoCo (Walker2d-V2) environment and our empirical results demonstrate that the proposed approach competes for SAC's performance and makes it robust to test time dimensionality perturbation.
引用
收藏
页码:1157 / 1166
页数:10
相关论文
共 30 条
  • [1] Abdullah Mohammed Amin, 2019, ARXIV190713196
  • [2] Obtaining fault tolerance avoidance behavior using deep reinforcement learning
    Aznar, Fidel
    Pujol, Mar
    Rizo, Ramon
    [J]. NEUROCOMPUTING, 2019, 345 : 77 - 91
  • [3] Multiplicative noise is beneficial for the transmission of sensory signals in simple neuron models
    Bauermann, Jonathan
    Lindner, Benjamin
    [J]. BIOSYSTEMS, 2019, 178 : 25 - 31
  • [4] Berner C., ARXIV191206680
  • [5] Doran M., 2017, P ICACCE 19 INT C AU, P621
  • [6] Fault accommodation in compliant quadruped robot through a moving appendage mechanism
    Gor, M. M.
    Pathak, P. M.
    Samantaray, A. K.
    Yang, J. -M.
    Kwak, S. W.
    [J]. MECHANISM AND MACHINE THEORY, 2018, 121 : 228 - 244
  • [7] Precise Control for Deep Driving using Dual Critic based DRL Approaches
    Gupta, Surbhi
    Singal, Gaurav
    Garg, Deepak
    [J]. 2021 IEEE INTELLIGENT VEHICLES SYMPOSIUM WORKSHOPS (IV WORKSHOPS), 2021, : 76 - 82
  • [8] QC_SANE: Robust Control in DRL Using Quantile Critic With Spiking Actor and Normalized Ensemble
    Gupta, Surbhi
    Singal, Gaurav
    Garg, Deepak
    Jagannathan, Sarangapani
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 6656 - 6662
  • [9] Deep Reinforcement Learning Techniques in Diversified Domains: A Survey
    Gupta, Surbhi
    Singal, Gaurav
    Garg, Deepak
    [J]. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2021, 28 (07) : 4715 - 4754
  • [10] James G., 2019, BOEING 737 MAX CRASH