Bayesian reinforcement learning for navigation planning in unknown environments

被引:1
作者
Alali, Mohammad [1 ]
Imani, Mahdi [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
来源
FRONTIERS IN ARTIFICIAL INTELLIGENCE | 2024年 / 7卷
关键词
rescue operations; Markov decision process; reinforcement learning; Bayesian decision-making; navigation planning; ROBOT NAVIGATION; EXPLORATION; UAVS;
D O I
10.3389/frai.2024.1308031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study focuses on a rescue mission problem, particularly enabling agents/robots to navigate efficiently in unknown environments. Technological advances, including manufacturing, sensing, and communication systems, have raised interest in using robots or drones for rescue operations. Effective rescue operations require quick identification of changes in the environment and/or locating the victims/injuries as soon as possible. Several techniques have been developed in recent years for autonomy in rescue missions, including motion planning, adaptive control, and more recently, reinforcement learning techniques. These techniques rely on full knowledge of the environment or the availability of simulators that can represent real environments during rescue operations. However, in practice, agents might have little or no information about the environment or the number or locations of injuries, preventing/limiting the application of most existing techniques. This study provides a probabilistic/Bayesian representation of the unknown environment, which jointly models the stochasticity in the agent's navigation and the environment uncertainty into a vector called the belief state. This belief state allows offline learning of the optimal Bayesian policy in an unknown environment without the need for any real data/interactions, which guarantees taking actions that are optimal given all available information. To address the large size of belief space, deep reinforcement learning is developed for computing an approximate Bayesian planning policy. The numerical experiments using different maze problems demonstrate the high performance of the proposed policy.
引用
收藏
页数:17
相关论文
共 64 条
[1]  
Akcakoca M., 2019, AIAA SCITECH 2019 FO
[2]   Bayesian Lookahead Perturbation Policy for Inference of Regulatory Networks [J].
Alali, Mohammad ;
Imani, Mahdi .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2024, 21 (05) :1504-1517
[3]   Deep reinforcement learning sensor scheduling for effective monitoring of dynamical systems [J].
Alali, Mohammad ;
Kazeminajafabadi, Armita ;
Imani, Mahdi .
SYSTEMS SCIENCE & CONTROL ENGINEERING, 2024, 12 (01)
[4]  
Alali M, 2023, P AMER CONTR CONF, P3957, DOI [10.23919/acc55779.2023.10155867, 10.23919/ACC55779.2023.10155867]
[5]  
[Anonymous], 2018, Advances in Neural Information Processing Systems
[6]  
Asadi N., 2024, ASCE INT C TRANSPORT
[7]  
Bajcsy A, 2019, IEEE DECIS CONTR P, P1758, DOI 10.1109/CDC40024.2019.9030133
[8]  
Blum T, 2020, Arxiv, DOI arXiv:2009.09595
[9]  
Bohn E, 2019, INT CONF UNMAN AIRCR, P523, DOI [10.1109/icuas.2019.8798254, 10.1109/ICUAS.2019.8798254]
[10]   A novel UAV path planning algorithm to search for floating objects on the ocean surface based on object's trajectory prediction by regression [J].
Boulares, Mehrez ;
Barnawi, Ahmed .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2021, 135