Risk-Sensitive Mobile Robot Navigation in Crowded Environment via Offline Reinforcement Learning

被引:0
|
作者
Wu, Jiaxu [1 ]
Wang, Yusheng [1 ]
Asama, Hajime [1 ]
An, Qi [2 ]
Yamashita, Atsushi [2 ]
机构
[1] Univ Tokyo, Grad Sch Engn, Dept Precis Engn, 7-3-1 Hongo,Bunkyo Ku, Tokyo 1138656, Japan
[2] Univ Tokyo, Grad Sch Frontier Sci, Dept Human & Engn Environm Studies, 5-1-5 Kashiwanoha, Kashiwa, Chiba 2778563, Japan
来源
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2023年
关键词
COLLISION-AVOIDANCE;
D O I
10.1109/IROS55552.2023.10341948
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mobile robot navigation in a human-populated environment has been of great interest to the research community in recent years, referred to as crowd navigation. Currently, offline reinforcement learning (RL)-based method has been introduced to this domain, for its ability to alleviate the sim2real gap brought by online RL which relies on simulators to execute training, and its scalability to use the same dataset to train for differently customized rewards. However, the performance of the navigation policy suffered from the distributional shift between the training data and the input during deployment, since when it gets an input out of the training data distribution, the learned policy has the risk of choosing an erroneous action that leads to catastrophic failure such as colliding with a human. To realize risk sensitivity and improve the safety of the offline RL agent during deployment, this work proposes a multipolicy control framework that combines offline RL navigation policy with a risk detector and a force-based risk-avoiding policy. In particular, a Lyapunov density model is learned using the latent feature of the offline RL policy and works as a risk detector to switch the control to the risk-avoiding policy when the robot has a tendency to go out of the area supported by the training data. Experimental results showed that the proposed method was able to learn navigation in a crowded scene from the offline trajectory dataset and the risk detector substantially reduces the collision rate of the vanilla offline RL agent while maintaining the navigation efficiency outperforming the state-of-the-art methods.
引用
收藏
页码:7456 / 7462
页数:7
相关论文
共 47 条
  • [31] Memory-driven deep-reinforcement learning for autonomous robot navigation in partially observable environments
    Montero, Estrella
    Pico, Nabih
    Ghergherehchi, Mitra
    Song, Ho Seung
    ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2025, 62
  • [32] Learning adversarial policy in multiple scenes environment via multi-agent reinforcement learning
    Li, Yang
    Wang, Xinzhi
    Wang, Wei
    Zhang, Zhenyu
    Wang, Jianshu
    Luo, Xiangfeng
    Xie, Shaorong
    CONNECTION SCIENCE, 2021, 33 (03) : 407 - 426
  • [33] Reinforcement learning based robot navigation using illegal actions for autonomous docking of surface vehicles in unknown environments
    Pereira, Maria Ines
    Pinto, Andry Maykol
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [34] Queue Formation and Obstacle Avoidance Navigation Strategy for Multi-Robot Systems Based on Deep Reinforcement Learning
    Gao, Tianyi
    Li, Zhanlan
    Xiong, Zhixin
    Wen, Ling
    Tian, Kai
    Cai, Kewei
    IEEE ACCESS, 2025, 13 : 14083 - 14100
  • [35] Multi-robot Target Encirclement Control with Collision Avoidance via Deep Reinforcement Learning
    Junchong Ma
    Huimin Lu
    Junhao Xiao
    Zhiwen Zeng
    Zhiqiang Zheng
    Journal of Intelligent & Robotic Systems, 2020, 99 : 371 - 386
  • [36] Self-Configuring Robot Path Planning With Obstacle Avoidance via Deep Reinforcement Learning
    Sangiovanni, Bianca
    Incremona, Gian Paolo
    Piastra, Marco
    Ferrara, Antonella
    IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (02): : 397 - 402
  • [37] Multi-robot Target Encirclement Control with Collision Avoidance via Deep Reinforcement Learning
    Ma, Junchong
    Lu, Huimin
    Xiao, Junhao
    Zeng, Zhiwen
    Zheng, Zhiqiang
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2020, 99 (02) : 371 - 386
  • [38] DEEP REINFORCEMENT LEARNING BASED PATH PLANNING FOR MOBILE ROBOTS USING TIME-SENSITIVE REWARD
    Zhao Ruqing
    Lu Xin
    Lyu Shubin
    Zhang Jihuai
    Li Fusheng
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [39] Socially-Aware Navigation of Omnidirectional Mobile Robot with Extended Social Force Model in Multi-Human Environment
    Yang, Chun-Tang
    Zhang, Tianshi
    Chen, Li-Pu
    Fu, Li-Chen
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 1963 - 1968
  • [40] Dual-robot formation transport in random environment and narrow restricted area via improved DDPG navigation
    Tang, Liang
    Ma, Ronggeng
    Chen, Bowen
    Niu, Yisen
    DISCOVER APPLIED SCIENCES, 2025, 7 (03)