Risk-Sensitive Mobile Robot Navigation in Crowded Environment via Offline Reinforcement Learning

被引：0

作者：

Wu, Jiaxu ^{[1
]}

Wang, Yusheng ^{[1
]}

Asama, Hajime ^{[1
]}

An, Qi ^{[2
]}

Yamashita, Atsushi ^{[2
]}

机构：

[1] Univ Tokyo, Grad Sch Engn, Dept Precis Engn, 7-3-1 Hongo,Bunkyo Ku, Tokyo 1138656, Japan

[2] Univ Tokyo, Grad Sch Frontier Sci, Dept Human & Engn Environm Studies, 5-1-5 Kashiwanoha, Kashiwa, Chiba 2778563, Japan

来源：

2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2023年

关键词：

COLLISION-AVOIDANCE;

D O I：

10.1109/IROS55552.2023.10341948

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mobile robot navigation in a human-populated environment has been of great interest to the research community in recent years, referred to as crowd navigation. Currently, offline reinforcement learning (RL)-based method has been introduced to this domain, for its ability to alleviate the sim2real gap brought by online RL which relies on simulators to execute training, and its scalability to use the same dataset to train for differently customized rewards. However, the performance of the navigation policy suffered from the distributional shift between the training data and the input during deployment, since when it gets an input out of the training data distribution, the learned policy has the risk of choosing an erroneous action that leads to catastrophic failure such as colliding with a human. To realize risk sensitivity and improve the safety of the offline RL agent during deployment, this work proposes a multipolicy control framework that combines offline RL navigation policy with a risk detector and a force-based risk-avoiding policy. In particular, a Lyapunov density model is learned using the latent feature of the offline RL policy and works as a risk detector to switch the control to the risk-avoiding policy when the robot has a tendency to go out of the area supported by the training data. Experimental results showed that the proposed method was able to learn navigation in a crowded scene from the offline trajectory dataset and the risk detector substantially reduces the collision rate of the vanilla offline RL agent while maintaining the navigation efficiency outperforming the state-of-the-art methods.

引用

页码：7456 / 7462

页数：7

共 47 条

[31] Memory-driven deep-reinforcement learning for autonomous robot navigation in partially observable environments
Montero, Estrella
Pico, Nabih
Ghergherehchi, Mitra
Song, Ho Seung
ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2025, 62
[32] Learning adversarial policy in multiple scenes environment via multi-agent reinforcement learning
Li, Yang
Wang, Xinzhi
Wang, Wei
Zhang, Zhenyu
Wang, Jianshu
Luo, Xiangfeng
Xie, Shaorong
CONNECTION SCIENCE, 2021, 33 (03) : 407 - 426
[33] Reinforcement learning based robot navigation using illegal actions for autonomous docking of surface vehicles in unknown environments
Pereira, Maria Ines
Pinto, Andry Maykol
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
[34] Queue Formation and Obstacle Avoidance Navigation Strategy for Multi-Robot Systems Based on Deep Reinforcement Learning
Gao, Tianyi
Li, Zhanlan
Xiong, Zhixin
Wen, Ling
Tian, Kai
Cai, Kewei
IEEE ACCESS, 2025, 13 : 14083 - 14100
[35] Multi-robot Target Encirclement Control with Collision Avoidance via Deep Reinforcement Learning
Junchong Ma
Huimin Lu
Junhao Xiao
Zhiwen Zeng
Zhiqiang Zheng
Journal of Intelligent & Robotic Systems, 2020, 99 : 371 - 386
[36] Self-Configuring Robot Path Planning With Obstacle Avoidance via Deep Reinforcement Learning
Sangiovanni, Bianca
Incremona, Gian Paolo
Piastra, Marco
Ferrara, Antonella
IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (02): : 397 - 402
[37] Multi-robot Target Encirclement Control with Collision Avoidance via Deep Reinforcement Learning
Ma, Junchong
Lu, Huimin
Xiao, Junhao
Zeng, Zhiwen
Zheng, Zhiqiang
JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2020, 99 (02) : 371 - 386
[38] DEEP REINFORCEMENT LEARNING BASED PATH PLANNING FOR MOBILE ROBOTS USING TIME-SENSITIVE REWARD
Zhao Ruqing
Lu Xin
Lyu Shubin
Zhang Jihuai
Li Fusheng
2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
[39] Socially-Aware Navigation of Omnidirectional Mobile Robot with Extended Social Force Model in Multi-Human Environment
Yang, Chun-Tang
Zhang, Tianshi
Chen, Li-Pu
Fu, Li-Chen
2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 1963 - 1968
[40] Dual-robot formation transport in random environment and narrow restricted area via improved DDPG navigation
Tang, Liang
Ma, Ronggeng
Chen, Bowen
Niu, Yisen
DISCOVER APPLIED SCIENCES, 2025, 7 (03)

← 1 2 3 4 5 →