Learning of Behavior Trees for Autonomous Agents

被引:55
作者
Colledanchise, Michele [1 ]
Parasuraman, Ramviyas [2 ]
Ogren, Petter [3 ]
机构
[1] Ist Italiano Tecnol, iCub Facil, I-16163 Genoa, Italy
[2] Purdue Univ, SMART Lab, W Lafayette, IN 47907 USA
[3] Royal Inst Technol KTH, S-11428 Stockholm, Sweden
关键词
Manuscript received April 10; 2017; revised September 11; accepted February 2; 2018. Date of publication March 19; 2018; date of current version June 14; 2019. This work was supported by the Swedish Foundation for Strategic Research through the Swedish Maritime Robotics Center. (Corresponding author: Michele Colledanchise.) M. Colledanchise is with the iCub Facility; Istituto Italiano di Tecnologia; 16163; Genova; Italy; (e-mail:; michele.colledanchise@iit.it). R. Parasuraman is with the SMART Lab; Purdue University; West Lafayette; IN; 47907; USA; ramviyas@purdue.edu). P. Ögren is with the Royal Institute of Technology (KTH); Stockholm; 11428; Sweden; petter@kth.se). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TG.2018.2816806 Fig. 1. Mario AI benchmark. The agent can observe only a small subset of the environment; namely the receptive field. (a) simple Scenario. (b) Complex scenario;
D O I
10.1109/TG.2018.2816806
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the problem of automatically synthesizing a successful behavior tree (BT) in an a priori unknown dynamic environment. Starting with a given set of actions, a reward function, and sensing in terms of a set of binary conditions, the proposed algorithm incrementally learns a switching structure, in terms of a BT, that is able to handle the situation encountered. Exploiting the fact that BTs generalize AND-OR -trees and also provide very natural chromosome mappings for genetic programming. we combine the long-term performance of genetic programming with a greedy element and use the AND-OR analogy to limit the size of the resulting structure. Finally, earlier results on BTs enable us to provide certain safety guarantees for the resulting system. Using the testing environment Mario AI, we compare our approach to alternative methods for learning BTs and finite state machines. The evaluation shows that the proposed approach generated solutions with better performance, and often fewer nodes than the other two methods.
引用
收藏
页码:183 / 189
页数:7
相关论文
共 25 条
[1]  
[Anonymous], 1994, COMPUTATIONAL INTELL
[2]   Robotics in remote and hostile environments [J].
Bellingham, James G. ;
Rajan, Kanna .
SCIENCE, 2007, 318 (5853) :1098-1102
[3]  
Busoniu L, 2010, AUTOM CONTROL ENG SE, P1, DOI 10.1201/9781439821091-f
[4]  
Colledanchise M., 2017, ARXIV170900084 CORR
[5]   How Behavior Trees Modularize Hybrid Control Systems and Generalize Sequential Behavior Compositions, the Subsumption Architecture, and Decision Trees [J].
Colledanchise, Michele ;
Ogren, Petter .
IEEE TRANSACTIONS ON ROBOTICS, 2017, 33 (02) :372-389
[6]  
Colledanchise M, 2014, IEEE INT CONF ROBOT, P3265, DOI 10.1109/ICRA.2014.6907328
[7]  
Croon G. D., 2005, P 8 EUR C ART LIF WO
[8]   Evolutionary robotics approach to odor source localization [J].
de Croon, G. C. H. E. ;
O'Connor, L. M. ;
Nicol, C. ;
Izzo, D. .
NEUROCOMPUTING, 2013, 121 :481-497
[9]   A genetic algorithm-based approach for building accurate decision trees [J].
Fu, ZW ;
Golden, BL ;
Lele, S ;
Raghavan, S ;
Wasil, EA .
INFORMS JOURNAL ON COMPUTING, 2003, 15 (01) :3-22
[10]   An integrated approach of learning, planning, and execution [J].
García-Martínez, R ;
Borrajo, D .
JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2000, 29 (01) :47-78