Collaborative promotion: Achieving safety and task performance by integrating imitation reinforcement learning

被引:0
|
作者
Zhang, Cai [1 ]
Zhang, Xiaoxiong [2 ,3 ]
Zhang, Hui [2 ,3 ]
Zhu, Fei [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
[2] Natl Univ Def Technol, Sixty Res Inst 3, Nanjing 210007, Peoples R China
[3] Natl Univ Def Technol, Lab big data & decis, Changsha 410073, Peoples R China
基金
中国国家自然科学基金;
关键词
Safe reinforcement learning; Imitation learning; Dual policy networks; Multi-objective optimization; Loose coupling;
D O I
10.1016/j.eswa.2024.124820
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the importance of safety is self-evident for artificial intelligence, like the two sides of a coin, excessively focusing on safety performance without considering task performance may cause the agent to become conservative and thus hesitant. How to make a balance between safety and task performance has been a pressing concern. To address this issue, we introduce Collaborative Promotion (CP) that is designed to harmonize safety and task objectives, thereby enabling a loosely coupled optimization of dual objectives. CP is a novel dual-policy framework where the safety and task objectives are assigned to the safety policy framework and task policy framework, respectively, as their primary goals. The actor-critic framework is constructed using the value function to guide the enhancement of these primary objectives. With the aid of imitation learning, secondary objective optimization is achieved through behavioral cloning, with each framework considering the other as an expert in its domain. The safety policy framework employs a weighted sum method for multi- objective optimization, establishing a primary-secondary relationship to facilitate loosely coupled optimization of safety and task objectives. In the realms of Safe Navigation and Safe Velocity, we have benchmarked CP against task-specific and safety-specific algorithms. Extensive experiments demonstrate that CP achieves the intended goals.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Integrating Learning Analytics and Collaborative Learning for Improving Student's Academic Performance
    Rafique, Adnan
    Khan, Muhammad Salman
    Jamal, Muhammad Hasan
    Tasadduq, Mamoona
    Rustam, Furqan
    Lee, Ernesto
    Washington, Patrick Bernard
    Ashraf, Imran
    IEEE ACCESS, 2021, 9 : 167812 - 167826
  • [22] Multi-Drone Collaborative Shepherding Through Multi-Task Reinforcement Learning
    Wang, Guanghui
    Peng, Junkun
    Guan, Chenyang
    Chen, Jinhua
    Guo, Bing
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (11): : 10311 - 10318
  • [23] CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum
    Ao, Shuang
    Zhou, Tianyi
    Long, Guodong
    Lu, Qinghua
    Zhu, Liming
    Jiang, Jing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [24] Multirobot collaborative task dynamic scheduling based on multiagent reinforcement learning with heuristic graph convolution considering robot service performance
    Zhou, Jian
    Zheng, Lianyu
    Fan, Wei
    Journal of Manufacturing Systems, 2024, 72 : 122 - 141
  • [25] Multirobot collaborative task dynamic scheduling based on multiagent reinforcement learning with heuristic graph convolution considering robot service performance
    Zhou, Jian
    Zheng, Lianyu
    Fan, Wei
    JOURNAL OF MANUFACTURING SYSTEMS, 2024, 72 : 122 - 141
  • [26] Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model
    Kinose, Akira
    Taniguchi, Tadahiro
    ADVANCED ROBOTICS, 2020, 34 (16) : 1055 - 1067
  • [27] Enhancing Task Performance of Learned Simplified Models via Reinforcement Learning
    Bui, Hien
    Posa, Michael
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 9212 - 9219
  • [28] Integrating Reinforcement Learning and Optimization Task: Evaluating an Agent to Dynamically Select PSO Communication Topology
    Lira, Rodrigo Cesar
    Macedo, Mariana
    Siqueira, Hugo Valadares
    Bastos-Filho, Carmelo
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2023, PT II, 2023, 13969 : 38 - 48
  • [29] Multi-platform collaborative firepower allocation method based on task decomposition and reinforcement learning
    Wu G.-H.
    Li B.-J.
    Yuan Y.-F.
    Lu Z.-F.
    Kongzhi yu Juece/Control and Decision, 2024, 39 (05): : 1727 - 1735
  • [30] Dopamine and performance in a reinforcement learning task: evidence from Parkinson's disease
    Shiner, Tamara
    Seymour, Ben
    Wunderlich, Klaus
    Hill, Ciaran
    Bhatia, Kailash P.
    Dayan, Peter
    Dolan, Raymond J.
    BRAIN, 2012, 135 : 1871 - 1883