Interpreting a deep reinforcement learning model with conceptual embedding and performance analysis

被引:2
作者
Dai, Yinglong [1 ,2 ]
Ouyang, Haibin [3 ]
Zheng, Hong [4 ]
Long, Han [1 ]
Duan, Xiaojun [1 ]
机构
[1] Natl Univ Def Technol, Coll Liberal Arts & Sci, Changsha 410073, Hunan, Peoples R China
[2] Hunan Normal Univ, Hunan Prov Key Lab Intelligent Comp & Language In, Changsha 410081, Hunan, Peoples R China
[3] Guangzhou Univ, Sch Mech & Elect Engn, Guangzhou 510006, Guangdong, Peoples R China
[4] Hunan Normal Univ, Sch Phys & Elect, Changsha 410081, Hunan, Peoples R China
基金
中国博士后科学基金;
关键词
Deep reinforcement learning; Deep neural networks; Interpretability; Conceptual embedding; Perturbation; Causality; LEVEL; GAME;
D O I
10.1007/s10489-022-03788-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The weak interpretability of the deep reinforcement learning (DRL) model becomes a serious impediment to the application of DRL agents in certain areas requiring high reliability. To interpret the behavior of a DRL agent, researchers use saliency maps to discover important parts of the agent's observation that influence its decision. However, the representations of saliency maps still cannot explicitly present the cause and effect between an agent's actions and its observations. In this paper, we analyze the inference procedure with respect to the DRL architecture and propose embedding interpretable intermediate representations for an agent's policy, the intermediate representations that are compressed and abstracted for explanation. We utilize a conceptual embedding technique to regulate the latent representation space of the deep models that can produce interpretable causal factors aligned with human concepts. Furthermore, the information loss of intermediate representation is analyzed to define the model performance upper bound and to measure the model performance degeneration. Experiments validate the effectiveness of the proposed method and the relationship between the observation information and an agent's performance upper bound.
引用
收藏
页码:6936 / 6952
页数:17
相关论文
共 34 条
[1]   Interpretable End-to-End Urban Autonomous Driving With Latent Deep Reinforcement Learning [J].
Chen, Jianyu ;
Li, Shengbo Eben ;
Tomizuka, Masayoshi .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (06) :5068-5078
[2]   An intelligent environment for preventing medication errors in home treatment [J].
Ciampi, Mario ;
Coronato, Antonio ;
Naeem, Muddasar ;
Silvestri, Stefano .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 193
[3]   Reinforcement learning for intelligent healthcare applications: A survey [J].
Coronato, Antonio ;
Naeem, Muddasar ;
De Pietro, Giuseppe ;
Paragliola, Giovanni .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2020, 109
[4]   Conceptual alignment deep neural networks [J].
Dai, Yinglong ;
Wang, Guojun ;
Li, Kuan-Ching .
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) :1631-1642
[5]   Stochastic intervention for causal inference via reinforcement learning [J].
Duong, Tri Dung ;
Li, Qian ;
Xu, Guandong .
NEUROCOMPUTING, 2022, 482 :40-49
[6]   A reinforcement learning approach for finding optimal policy of adaptive radiation therapy considering uncertain tumor biological response [J].
Ebrahimi, Saba ;
Lim, Gino J. .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2021, 121
[7]   Interpretable Explanations of Black Boxes by Meaningful Perturbation [J].
Fong, Ruth C. ;
Vedaldi, Andrea .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3449-3457
[8]  
Greydanus S, 2018, PR MACH LEARN RES, V80
[9]  
Haarnoja T, 2018, PR MACH LEARN RES, V80
[10]   Explainability in deep reinforcement learning [J].
Heuillet, Alexandre ;
Couthouis, Fabien ;
Diaz-Rodriguez, Natalia .
KNOWLEDGE-BASED SYSTEMS, 2021, 214 (214)