POMDP Manipulation via Trajectory Optimization

被引:0
作者
Ngo Anh Vien [1 ]
Toussaint, Marc [1 ]
机构
[1] Univ Stuttgart, Machine Learning & Robot Lab, Stuttgart, Germany
来源
2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2015年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efficient object manipulation based only on force feedback typically requires a plan of actively contact-seeking actions to reduce uncertainty over the true environmental model. In principle, that problem could be formulated as a full partially observable Markov decision process (POMDP) whose observations are sensed forces indicating the presence/absence of contacts with objects. Such a naive application leads to a very large POMDP with high-dimensional continuous state, action and observation spaces. Solving such large POMDPs is practically prohibitive. In other words, we are facing three challenging problems: 1) uncertainty over discontinuous contacts with objects; 2) high-dimensional continuous spaces; 3) optimization for not only trajectory cost but also execution time. As trajectory optimization is a powerful model-based method for motion generation, it can handle the last two issues effectively by computing locally optimal trajectories. This paper aims to integrate advantages of trajectory optimization into existing POMDP solvers. The full POMDP formulation is solved using sample-based approaches, where each sampled model is quickly evaluated via trajectory optimization instead of simulating a large number of rollouts. To further accelerate the solver, we propose to integrate temporal abstraction, i.e. macro actions or temporal actions, into the POMDP model. We demonstrate the proposed method on a simulated 7 DoF KUKA arm and a physical Willow Garage PR2 platform. The results show that our proposed method could effectively seek contacts in complex scenarios, and achieve near-optimal performance of path planing.
引用
收藏
页码:242 / 249
页数:8
相关论文
共 32 条
[1]  
[Anonymous], 2012, P ACM SIGGRAPH EUR S
[2]  
Bai HY, 2013, IEEE INT CONF ROBOT, P2853, DOI 10.1109/ICRA.2013.6630972
[3]  
Bai HY, 2010, SPRINGER TRAC ADV RO, V68, P175
[4]  
Deimel R., 2013, INT S ROB RES ISRR
[5]  
Erez T, 2012, IEEE INT C INT ROBOT, P4914, DOI 10.1109/IROS.2012.6386181
[6]   Value-function approximations for partially observable Markov decision processes [J].
Hauskrecht, M .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 13 :33-94
[7]   Grasping POMDPs [J].
Hsiao, Kaijen ;
Kaelbling, Leslie Pack ;
Lozano-Perez, Tomas .
PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-10, 2007, :4685-+
[8]   Robust grasping under object pose uncertainty [J].
Hsiao, Kaijen ;
Kaelbling, Leslie Pack ;
Lozano-Perez, Tomas .
AUTONOMOUS ROBOTS, 2011, 31 (2-3) :253-268
[9]  
Hudson N, 2012, IEEE INT CONF ROBOT, P2371, DOI 10.1109/ICRA.2012.6225101
[10]  
Jacobson D. H, 1970, Differential dynamic programming.