Simplified Risk-aware Decision Making with Belief-dependent Rewards in Partially Observable Domains

被引:5
作者
Zhitnikov, Andrey [1 ]
Indelman, Vadim [2 ]
机构
[1] Technion Autonomous Syst Program TASP, IL-3200003 Haifa, Israel
[2] Technion Israel Inst Technol, Dept Aerosp Engn, IL-32000 Haifa, Israel
基金
以色列科学基金会;
关键词
Artificial intelligence; Decision making under uncertainty; Belief space planning; POMDP;
D O I
10.1016/j.artint.2022.103775
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the recent advent of risk awareness, decision-making algorithms' complexity increases, posing a severe difficulty to solve such formulations of the problem online. Our approach is centered on the distribution of the return in the challenging continuous domain under partial observability. This paper proposes a simplification framework to ease the computational burden while providing guarantees on the simplification impact. On top of this framework, we present novel stochastic bounds on the return that apply to any reward function. Further, we consider simplification's impact on decision making with risk averse objectives, which, to the best of our knowledge, has not been investigated thus far. In particular, we prove that stochastic bounds on the return yield deterministic bounds on Value at Risk. The second part of the paper focuses on the joint distribution of a pair of returns given a pair of candidate policies, thereby, for the first time, accounting for the correlation between these returns. Here, we propose a novel risk averse objective and apply our simplification paradigm. Moreover, we present a novel tool called the probabilistic loss (PLoss) to completely characterize the simplification impact for any objective operator in this setting. We provably bound the cumulative and tail distribution function of PLoss using PbLoss to provide such a characterization online using only the simplified problem. In addition, we utilize this tool to offer deterministic guarantees to the simplification in the context of our novel risk averse objective. We employ our proposed framework on a particular simplification technique - reducing the number of samples for reward calculation or belief representation within planning. Finally, we verify the advantages of our approach through extensive simulations. (C) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:36
相关论文
共 44 条
[1]  
[Anonymous], 2010, ROBOTICS SCI SYSTEMS
[2]  
Araya-Lopez M., 2010, P 24 ANN C NEURAL IN, P64
[3]  
Barenboim M, 2022, PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, P4588
[4]  
Bertsekas D.P., 1995, Dynamic Programming and Optimal Control, V1
[5]  
Boers Y., 2010, 2010 13 INT C INF FU, P1, DOI [DOI 10.1109/ICIF.2010.5712013, 10.1109/ICIF.2010.5712013]
[6]  
Chow Yinlam, 2015, Advances in Neural Information Processing Systems, V28
[7]  
Defourny B., 2008, NIPS WORKSH MOD UNC
[8]  
Dressel L, 2017, P I C AUTOMAT PLAN S, P70
[9]   Simplified decision making in the belief space using belief sparsification [J].
Elimelech, Khen ;
Indelman, Vadim .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2022, 41 (05) :470-496
[10]  
Fehr M., 2018, ADV NEURAL INFORM PR, V31, P6933