Black-Box Reward Attacks Against Deep Reinforcement Learning Based on Successor Representation

被引:3
作者
Cai, Kanting [1 ]
Zhu, Xiangbin [1 ]
Hu, Zhao-Long [1 ]
机构
[1] Zhejiang Normal Univ, Coll Math & Comp Sci, Jinhua 321004, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Perturbation methods; Neural networks; Reinforcement learning; Training; Timing; Deep learning; Data models; Black-box attacks; corrupted rewards; deep reinforcement learning; successor representation;
D O I
10.1109/ACCESS.2022.3174963
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although the deep reinforcement learning (DRL) technology has been widely adopted in various fields, it has become an important research hotspot to study the vulnerability of DRL for improving the robustness of DRL agents. The adversarial attack methods based on white-box models, where the adversary can access all the information of victims, have been intensively investigated. However, in most practical situations, the adversary cannot obtain the internal information of the victim's neural network. Furthermore, for reward-based attacks, the agent can perform anomaly detection on the perturbed rewards to detect whether it has been attacked. In this paper, we propose a black-box attack method with corrupted rewards, which employs DRL exploration mechanisms to improve the effectiveness of attacking agents. The adversary builds a deep neural network in advance to learn the successor representation (SR) of each state. Then, the adversary can determine the timing of attacks and generate imperceptible adversarial perturbations based on the values of the SR. Experimental results show that the black-box attack algorithm based on SR proposed in this paper can effectively attack agents with fewer adversarial samples.
引用
收藏
页码:51548 / 51560
页数:13
相关论文
共 22 条
[1]  
[Anonymous], 2016, ARXIV160605312
[2]  
Bai X., 2018, P IEEE 3 INT C DAT S, P781
[3]  
Behzadan Vahid, 2017, Machine Learning and Data Mining in Pattern Recognition. 13th International Conference, MLDM 2017. Proceedings: LNAI 10358, P262, DOI 10.1007/978-3-319-62416-7_19
[4]  
Behzadan V, 2019, ARXIV190601121
[5]  
Chen T., 2018, arXiv
[6]  
Choshen L., 2018, ARXIV180404012
[7]  
Fayjie AR, 2018, INT CONF UBIQ ROBOT, P896
[8]  
Gleave Adam, 2019, Adversarial Policies: Attacking Deep Reinforcement Learning
[9]  
Hussenot L., 2019, ARXIV1905 12282
[10]  
Inkawhich M., 2019, ARXIV190511832