Markov Decision Processes with Unknown State Feature Values for Safe Exploration using Gaussian Processes

被引:10
作者
Budd, Matthew [1 ]
Lacerda, Bruno [1 ]
Duckworth, Paul [1 ]
West, Andrew [2 ]
Lennox, Barry [2 ]
Hawes, Nick [1 ]
机构
[1] Univ Oxford, Dept Engn Sci, Oxford, England
[2] Univ Manchester, Dept Elect & Elect Engn, Manchester, Lancs, England
来源
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2020年
基金
英国工程与自然科学研究理事会; 英国科研创新办公室;
关键词
D O I
10.1109/IROS45743.2020.9341589
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When exploring an unknown environment, a mobile robot must decide where to observe next. It must do this whilst minimising the risk of failure, by only exploring areas that it expects to be safe. In this context, safety refers to the robot remaining in regions where critical environment features (e.g. terrain steepness, radiation levels) are within ranges the robot is able to tolerate. More specifically, we consider a setting where a robot explores an environment modelled with a Markov decision process, subject to bounds on the values of one or more environment features which can only be sensed at runtime. We use a Gaussian process to predict the value of the environment feature in unvisited regions, and propose an estimated Markov decision process, a model that integrates the Gaussian process predictions with the environment model transition probabilities. Building on this model, we propose an exploration algorithm that, contrary to previous approaches, considers probabilistic transitions and explicitly reasons about the uncertainty over the Gaussian process predictions. Furthermore, our approach increases the speed of exploration by selecting locations to visit further away from the currently explored area. We evaluate our approach on a real-world gamma radiation dataset, tackling the challenge of a nuclear material inspection robot exploring an a priori unknown area.
引用
收藏
页码:7344 / 7350
页数:7
相关论文
共 20 条
  • [1] [Anonymous], 1976, DENUMERABLE MARKOV C, DOI DOI 10.1007/978-1-4684-9455-6
  • [2] [Anonymous], 2017, JMLR
  • [3] Planning using hierarchical constrained Markov decision processes
    Feyzabadi, Seyedshams
    Carpin, Stefano
    [J]. AUTONOMOUS ROBOTS, 2017, 41 (08) : 1589 - 1607
  • [4] Information-Guided Robotic Maximum Seek-and-Sample in Partially Observable Continuous Environments
    Flaspohler, Genevieve
    Preston, Victoria
    Michel, Anna P. M.
    Girdhar, Yogesh
    Roy, Nicholas
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2019, 4 (04): : 3782 - 3789
  • [5] Gopalan Nakul, 2017, ICAPS
  • [6] Planning and acting in partially observable stochastic domains
    Kaelbling, LP
    Littman, ML
    Cassandra, AR
    [J]. ARTIFICIAL INTELLIGENCE, 1998, 101 (1-2) : 99 - 134
  • [7] Kwiatkowska M., 2011, CAV
  • [8] Probabilistic planning with formal performance guarantees for mobile service robots
    Lacerda, Bruno
    Faruq, Fatma
    Parker, David
    Hawes, Nick
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2019, 38 (09) : 1098 - 1123
  • [9] Lauri M., 2010, AAMAS
  • [10] Marchant R., 2014, UAI