Robust Proximal Adversarial Reinforcement Learning Under Model Mismatch

被引：0

作者：

Zhai, Peng ^{[1
]}

Wei, Xiaoyi ^{[1
]}

Hou, Taixian ^{[1
]}

Ji, Xiaopeng ^{[2
]}

Dong, Zhiyan ^{[3
]}

Yi, Jiafu ^{[4
]}

Zhang, Lihua ^{[5
]}

机构：

[1] Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China

[2] Zhejiang Univ, State Key Lab CAD & CG, Hangzhou 310058, Peoples R China

[3] Ji Hua Lab, Foshan 528251, Peoples R China

[4] Hainan Univ, Sch Informat & Commun Engn, Hainan 570228, Peoples R China

[5] Engn Res Ctr AI & Robot, Shanghai 200433, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 11期

基金：

中国博士后科学基金; 国家重点研发计划;

关键词：

Perturbation methods; Training; Robustness; Reinforcement learning; Games; Noise; Complexity theory; Uncertainty; Transforms; Safety; Reinforcement learning (RL); machine learning for robot control; robust/adaptive control;

D O I：

10.1109/LRA.2024.3472348

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Reinforcement learning (RL) can generate high-performance control policies for complex tasks in simulation through an end-to-end approach. However, the RL policy is not robust to uncertainties caused by modeling mismatch between simulation and real environments, making it difficult to transfer to the real world. In response to the above challenge, this letter introduces a lightweight and efficient robust RL algorithm. The algorithm transforms the optimization objective of the adversary from a long-term cumulative reward to a short-term reward, making the adversary focus on the performance in the near future. Additionally, the adversarial actions are projected onto a finite subset within the perturbation space using projected gradient descent, effectively constraining the adversary's strength and obtaining more robust policies. Extensive experiments in both simulated and real environments show that our algorithm improves the generalization ability of the policy for the modeling mismatch, outperforming the next best prior methods across almost all environments.

引用

页码：10248 / 10255

页数：8

共 29 条

[1] Adversarial attack and defense in reinforcement learning-from AI security view [J].

Chen, Tong ;

Liu, Jiqiang ;

Xiang, Yingxiao ;

Niu, Wenjia ;

Tong, Endong ;

Han, Zhen .

CYBERSECURITY, 2019, 2 (01)

[2]

Fei F, 2020, IEEE INT CONF ROBOT, P7358, DOI [10.1109/icra40945.2020.9196611, 10.1109/ICRA40945.2020.9196611]

[3] An Open Torque-Controlled Modular Robot Architecture for Legged Locomotion Research [J].

Grimminger, Felix ;

Meduri, Avadesh ;

Khadiv, Majid ;

Viereck, Julian ;

Wuthrich, Manuel ;

Naveau, Maximilien ;

Berenz, Vincent ;

Heim, Steve ;

Widmaier, Felix ;

Flayols, Thomas ;

Fiene, Jonathan ;

Badri-Sproewitz, Alexander ;

Righetti, Ludovic .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) :3650-3657

[4] Review of Deep Reinforcement Learning for Robot Manipulation [J].

Hai Nguyen ;

Hung Manh La .

2019 THIRD IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING (IRC 2019), 2019, :590-595

[5]

Huang P D, 2022, P 31 INT JOINT C, P3099

[6]

Huang Sandy, 2017, P INT C LEARN REPR

[7] Reinforcement Learning for UAV Attitude Control [J].

Koch, William ;

Mancuso, Renato ;

West, Richard ;

Bestavros, Azer .

ACM TRANSACTIONS ON CYBER-PHYSICAL SYSTEMS, 2019, 3 (02)

[8]

Kos Jernej, 2017, 5 INT C LEARN REPR I

[9]

Lee XY, 2020, AAAI CONF ARTIF INTE, V34, P4577

[10]

Leutenegger S, 2016, SPRINGER HANDBOOK OF ROBOTICS, P623

← 1 2 3 →