Collaborative promotion: Achieving safety and task performance by integrating imitation reinforcement learning

被引:0
|
作者
Zhang, Cai [1 ]
Zhang, Xiaoxiong [2 ,3 ]
Zhang, Hui [2 ,3 ]
Zhu, Fei [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
[2] Natl Univ Def Technol, Sixty Res Inst 3, Nanjing 210007, Peoples R China
[3] Natl Univ Def Technol, Lab big data & decis, Changsha 410073, Peoples R China
基金
中国国家自然科学基金;
关键词
Safe reinforcement learning; Imitation learning; Dual policy networks; Multi-objective optimization; Loose coupling;
D O I
10.1016/j.eswa.2024.124820
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the importance of safety is self-evident for artificial intelligence, like the two sides of a coin, excessively focusing on safety performance without considering task performance may cause the agent to become conservative and thus hesitant. How to make a balance between safety and task performance has been a pressing concern. To address this issue, we introduce Collaborative Promotion (CP) that is designed to harmonize safety and task objectives, thereby enabling a loosely coupled optimization of dual objectives. CP is a novel dual-policy framework where the safety and task objectives are assigned to the safety policy framework and task policy framework, respectively, as their primary goals. The actor-critic framework is constructed using the value function to guide the enhancement of these primary objectives. With the aid of imitation learning, secondary objective optimization is achieved through behavioral cloning, with each framework considering the other as an expert in its domain. The safety policy framework employs a weighted sum method for multi- objective optimization, establishing a primary-secondary relationship to facilitate loosely coupled optimization of safety and task objectives. In the realms of Safe Navigation and Safe Velocity, we have benchmarked CP against task-specific and safety-specific algorithms. Extensive experiments demonstrate that CP achieves the intended goals.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Integrating Sporadic Imitation in Reinforcement Learning Robots
    Richert, Willi
    Scheller, Ulrich
    Koch, Markus
    Kleinjohann, Bernd
    Stern, Claudius
    ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 193 - 198
  • [2] Using reinforcement learning to adapt an imitation task
    Guenter, Florent
    Billard, Aude G.
    2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, : 1028 - 1033
  • [3] Combining imitation and deep reinforcement learning to human-level performance on a virtual foraging task
    Giammarino, Vittorio
    Dunne, Matthew F.
    Moore, Kylie N.
    Hasselmo, Michael E.
    Stern, Chantal E.
    Paschalidis, Ioannis Ch
    ADAPTIVE BEHAVIOR, 2024, 32 (03) : 251 - 263
  • [4] Generalized Path Planning for Collaborative UAVs using Reinforcement and Imitation Learning
    Farley, Jack
    Chapnevis, Amirahmad
    Bulut, Eyuphan
    PROCEEDINGS OF THE 2023 INTERNATIONAL SYMPOSIUM ON THEORY, ALGORITHMIC FOUNDATIONS, AND PROTOCOL DESIGN FOR MOBILE NETWORKS AND MOBILE COMPUTING, MOBIHOC 2023, 2023, : 457 - 462
  • [5] Comparison of multiple reinforcement learning and deep reinforcement learning methods for the task aimed at achieving the goal
    Parak R.
    Matousek R.
    Mendel, 2021, 27 (01) : 1 - 8
  • [6] Task Independent Safety Assessment for Reinforcement Learning
    Jocas, Mark
    Zoghlami, Firas
    Kurrek, Philip
    Gianni, Mario
    Salehi, Vahid
    TOWARDS AUTONOMOUS ROBOTIC SYSTEMS, TAROS 2022, 2022, 13546 : 190 - 204
  • [7] Task-Agnostic Safety for Reinforcement Learning
    Rahman, Md Asifur
    Alqahtani, Sarra
    PROCEEDINGS OF THE 16TH ACM WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, AISEC 2023, 2023, : 139 - 148
  • [8] Anticipatory model of musical style imitation using collaborative and competitive reinforcement learning
    Cont, Arshia
    Dubnov, Shlomo
    Assayag, Gerard
    ANTICIPATORY BEHAVIOR IN ADAPTIVE LEARNING SYSTEMS: FROM BRAINS TO INDIVIDUAL AND SOCIAL BEHAVIOR, 2007, 4520 : 285 - +
  • [9] Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task
    Shitov, Denis
    Pirogova, Elena
    Wysocki, Tadeusz A.
    Lech, Margaret
    SENSORS, 2023, 23 (07)
  • [10] Task Offloading in Computing Continuum Using Collaborative Reinforcement Learning
    Robles-Enciso, Alberto
    Skarmeta, Antonio F.
    INTERNET OF THINGS, GIOTS 2022, 2022, 13533 : 82 - 95