Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis

被引：0

作者：

Hengzhe Zhang

Aimin Zhou

Xin Lin

机构：

[1] East China Normal University,Shanghai Key Laboratory of Multidimensional information Processing, School of Computer Science and Technology

来源：

Complex & Intelligent Systems | 2020年 / 6卷

关键词：

Reinforcement learning; Genetic programming; Policy derivation; Explainable machine learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Reinforcement learning based on the deep neural network has attracted much attention and has been widely used in real-world applications. However, the black-box property limits its usage from applying in high-stake areas, such as manufacture and healthcare. To deal with this problem, some researchers resort to the interpretable control policy generation algorithm. The basic idea is to use an interpretable model, such as tree-based genetic programming, to extract policy from other black box modes, such as neural networks. Following this idea, in this paper, we try yet another form of the genetic programming technique, evolutionary feature synthesis, to extract control policy from the neural network. We also propose an evolutionary method to optimize the operator set of the control policy for each specific problem automatically. Moreover, a policy simplification strategy is also introduced. We conduct experiments on four reinforcement learning environments. The experiment results reveal that evolutionary feature synthesis can achieve better performance than tree-based genetic programming to extract policy from the neural network with comparable interpretability.

引用

页码：741 / 753

页数：12

共 31 条

[1]

Alibekov E(2018)Policy derivation methods for critic-only reinforcement learning in continuous spaces Eng Appl Artif Intell 69 178-187

[2]

Kubalík J(2019)Machine learning for 5g/b5g mobile and wireless communications: potential, limitations, and future directions IEEE Access 7 137184-137206

[3]

Babuska R(2018)Interpretable policies for reinforcement learning by genetic programming Eng Appl Artif Intell 76 158-169

[4]

Cayamcela MEM(2015)Human-level control through deep reinforcement learning Nature 518 529-66

[5]

Lee H(2017)Genetic programming for production scheduling: a survey with a unified framework Complex Intell Syst 3 41-1201

[6]

Lim W(2016)Value function discovery in markov decision processes with evolutionary algorithms IEEE Trans Syst Man Cybern Syst 46 1190-2830

[7]

Hein D(2011)Scikit-learn: machine learning in python J Mach Learn Res 12 2825-215

[8]

Udluft S(2019)Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Nat Mach Intell 1 206-18

[9]

Runkler TA(2017)Mastering the game of go without human knowledge Nature 550 354-349

[10]

Mnih V(2018)An intelligent noninvasive model for coronary artery disease detection Complex Intell Syst 4 11-undefined

← 1 2 3 4 →