SDRL: Interpretable and Data-Efficient Deep Reinforcement Learning Leveraging Symbolic Planning

被引：0

作者：

Lyu, Daoming ^{[1
]}

Yang, Fangkai ^{[2
]}

Liu, Bo ^{[1
]}

Gustafson, Steven ^{[2
]}

机构：

[1] Auburn Univ, Auburn, AL 36849 USA

[2] Maana Inc, Bellevue, WA USA

来源：

THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2019年

关键词：

PLATFORM;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options. This framework features a planner - controller - meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.

引用

页码：2970 / 2977

页数：8

共 37 条

[1]

[Anonymous], 2005, NIPS, DOI DOI 10.21236/ADA440280

[2]

[Anonymous], 2018, ARXIV180600069

[3] Recent Advances in Hierarchical Reinforcement Learning [J].

Andrew G. Barto ;

Sridhar Mahadevan .

Discrete Event Dynamic Systems, 2003, 13 (4) :341-379

[4] The Arcade Learning Environment: An Evaluation Platform for General Agents [J].

Bellemare, Marc G. ;

Naddaf, Yavar ;

Veness, Joel ;

Bowling, Michael .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 47 :253-279

[5]

Chen K., 2016, INT JOINT C ART INT, P812

[6]

Cimatti A., 2008, HDB KNOWLEDGE REPRES

[7]

Doshi-Velez Finale, 2017, ARXIV

[8] Conflict-driven answer set solving: From theory to practice [J].

Gebser, Martin ;

Kaufmann, Benjamin ;

Schaub, Torsten .

ARTIFICIAL INTELLIGENCE, 2012, 187 :52-89

[9]

Gelfond M., 1998, ELECT T ARTIFICIAL

[10]

Hanheide M., 2015, ARTIFICIAL INTELLIGE

← 1 2 3 4 →