A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot

被引:142
作者
Martinez-Cantin, Ruben [1 ]
de Freitas, Nando [2 ]
Brochu, Eric [2 ]
Castellanos, Jose [3 ]
Doucet, Arnaud [2 ]
机构
[1] Inst Super Tecn, Inst Syst & Robot, Lisbon, Portugal
[2] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1W5, Canada
[3] Univ Zaragoza, Dept Comp Sci & Syst Engn, Zaragoza, Spain
基金
加拿大自然科学与工程研究理事会;
关键词
Bayesian optimization; Online path planning; Sequential experimental design; Attention and gaze planning; Active vision; Dynamic sensor networks; Active learning; Policy search; Active SLAM; Model predictive control; Reinforcement learning; GLOBAL OPTIMIZATION; REINFORCEMENT; ALGORITHMS;
D O I
10.1007/s10514-009-9130-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of online path planning for optimal sensing with a mobile robot. The objective of the robot is to learn the most about its pose and the environment given time constraints. We use a POMDP with a utility function that depends on the belief state to model the finite horizon planning problem. We replan as the robot progresses throughout the environment. The POMDP is high-dimensional, continuous, non-differentiable, nonlinear, non-Gaussian and must be solved in real-time. Most existing techniques for stochastic planning and reinforcement learning are therefore inapplicable. To solve this extremely complex problem, we propose a Bayesian optimization method that dynamically trades off exploration (minimizing uncertainty in unknown parts of the policy space) and exploitation (capitalizing on the current best solution). We demonstrate our approach with a visually-guide mobile robot. The solution proposed here is also applicable to other closely-related domains, including active vision, sequential experimental design, dynamic sensing and calibration with mobile sensors.
引用
收藏
页码:93 / 103
页数:11
相关论文
共 54 条
  • [1] [Anonymous], 2006, P IEEE RSJ INT C INT
  • [2] [Anonymous], MODIFICATION DIRECT
  • [3] [Anonymous], 1978, Towards Global Optimization
  • [4] [Anonymous], 2007, ROBOTICS SCI SYSTEMS
  • [5] Bailey T., 2006, P IEEE RSJ INT C INT
  • [6] Infinite-horizon policy-gradient estimation[J]. Baxter, J;Bartlett, PL. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001
  • [7] Bergman N., 1999, THESIS LINKOPING U
  • [8] Bertsekas D. P., 1995, Dynamic programming and optimal control
  • [9] Brochu E., 2007, Advances in Neural Information Processing Systems
  • [10] Observability analysis and active control for airborne SLAM[J]. Bryson, Mitch;Sukkarieh, Salah. IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2008(01)