An Approach of Transforming Non-Markovian Reward to Markovian Reward

被引：0

作者：

Miao, Ruixuan ^{[1
,2
]}

Lu, Xu ^{[1
,2
]}

Cui, Jin ^{[3
]}

机构：

[1] Xidian Univ, Inst Comp Theory & Technol, Xian, Peoples R China

[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian, Peoples R China

[3] Xian Shiyou Univ, Sch Comp Sci, Xian, Peoples R China

来源：

STRUCTURED OBJECT-ORIENTED FORMAL LANGUAGE AND METHOD, SOFL+MSVL 2022 | 2023年 / 13854卷

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Non-Markovian Reward; Reward Shaping; MDP; Temporal Logic;

D O I：

10.1007/978-3-031-29476-1_2

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In many decision-making problems, a rational reward function is required, which can correctly guide agents to make ideal operations. For example, an intelligent robot needs to check its power before sweeping. This kind of reward functions involves historical states, rather than a single current state. It is referred to as non-Markovian reward. However, state-of-the-art MDP (Markov Decision Process) planners only support Markovian reward. In this paper, we present an approach to transform non-Markovian reward expressed in LTLf (Linear Temporal Logic over Finite Traces) into Markovian reward. LTLf is converted into an automaton which is compiled to standard MDP model. Then the reward function of the model is further optimized through reward shaping in order to speed up planning. The reshaped reward function can be exploited by MDP planners to guide search and produce good training results. Finally, experiments with augmented International Probabilistic Planning Competition (IPPC) domain demonstrates the effectiveness and feasibility of our approach, especially the reshaped reward function can significantly improve the performance of planners.

引用

页码：12 / 29

页数：18

共 19 条

[1]

Bacchus F, 1996, PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, P1160

[2] mGPT: A probabilistic planner based on heuristic search [J].

Bonet, B ;

Geffner, H .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2005, 24 :933-944

[3]

Brafman RI, 2018, AAAI CONF ARTIF INTE, P1771, DOI 10.1609/aaai.v32i1.11572

[4]

Camacho A., 2018, GOALSRL WORKSHOP COL

[5]

Geisser F., 2019, P 2019 WORKSH INT PL, P27

[6]

Icarte RT, 2018, PR MACH LEARN RES, V80

[7]

Keller T., 2012, 22 INT C AUTOMATED P

[8] GOLOG: A logic programming language for dynamic domains [J].

Levesque, HJ ;

Reiter, R ;

Lesperance, Y ;

Lin, FZ ;

Scherl, RB .

JOURNAL OF LOGIC PROGRAMMING, 1997, 31 (1-3) :59-83

[9] SAT-based explicit LTLf satisfiability checking [J].

Li, Jianwen ;

Pu, Geguang ;

Zhang, Yueling ;

Vardi, Moshe Y. ;

Rozier, Kristin Y. .

ARTIFICIAL INTELLIGENCE, 2020, 289

[10]

Li X, 2017, IEEE INT C INT ROBOT, P3834, DOI 10.1109/IROS.2017.8206234

← 1 2 →