A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot

被引：147

作者：

Martinez-Cantin, Ruben ^{[1
]}

de Freitas, Nando ^{[2
]}

Brochu, Eric ^{[2
]}

Castellanos, Jose ^{[3
]}

Doucet, Arnaud ^{[2
]}

机构：

[1] Inst Super Tecn, Inst Syst & Robot, Lisbon, Portugal

[2] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1W5, Canada

[3] Univ Zaragoza, Dept Comp Sci & Syst Engn, Zaragoza, Spain

来源：

AUTONOMOUS ROBOTS | 2009年 / 27卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Bayesian optimization; Online path planning; Sequential experimental design; Attention and gaze planning; Active vision; Dynamic sensor networks; Active learning; Policy search; Active SLAM; Model predictive control; Reinforcement learning; GLOBAL OPTIMIZATION; REINFORCEMENT; ALGORITHMS;

D O I：

10.1007/s10514-009-9130-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We address the problem of online path planning for optimal sensing with a mobile robot. The objective of the robot is to learn the most about its pose and the environment given time constraints. We use a POMDP with a utility function that depends on the belief state to model the finite horizon planning problem. We replan as the robot progresses throughout the environment. The POMDP is high-dimensional, continuous, non-differentiable, nonlinear, non-Gaussian and must be solved in real-time. Most existing techniques for stochastic planning and reinforcement learning are therefore inapplicable. To solve this extremely complex problem, we propose a Bayesian optimization method that dynamically trades off exploration (minimizing uncertainty in unknown parts of the policy space) and exploitation (capitalizing on the current best solution). We demonstrate our approach with a visually-guide mobile robot. The solution proposed here is also applicable to other closely-related domains, including active vision, sequential experimental design, dynamic sensing and calibration with mobile sensors.

引用

页码：93 / 103

页数：11

共 54 条

[41]

Schonlau M., 1998, Lect. Notes Monogr. Ser., V34, P11, DOI DOI 10.1214/LNMS/1215456182

[42]

Sim R., 2005, P IEEE INT C ROB AUT

[43]

SINGH A, 2007, P INT JOINT C ART IN

[44] Efficient Informative Sensing using Multiple Robots [J].

Singh, Amarjeet ;

Krause, Andreas ;

Guestrin, Carlos ;

Kaiser, William J. .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2009, 34 :707-755

[45]

Singh S, 2005, IEEE DECIS CONTR P, P7296

[46] OPTIMAL CONTROL OF PARTIALLY OBSERVABLE MARKOV PROCESSES OVER A FINITE HORIZON [J].

SMALLWOOD, RD ;

SONDIK, EJ .

OPERATIONS RESEARCH, 1973, 21 (05) :1071-1088

[47]

Stachniss C., 2005, P ROB SCI SYST C, V2, P65

[48]

STOLLE M, 2009, AUTONOMOUS ROBOTS B, V27

[49] Optimal observer trajectory in bearings-only tracking for manoeuvring sources [J].

Trémois, O ;

Le Cadre, JP .

IEE PROCEEDINGS-RADAR SONAR AND NAVIGATION, 1999, 146 (01) :31-39

[50]

VAZQUEZ E, 2008, ARXIV07123744V2STATC

← 1 2 3 4 5 6 →