EDGE: Explaining Deep Reinforcement Learning Policies

被引:0
|
作者
Guo, Wenbo [1 ]
Wu, Xian [1 ]
Khan, Usmann [2 ]
Xing, Xinyu [3 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
[3] Penn State Univ, Northwestern Univ, University Pk, PA 16802 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
关键词
LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of deep reinforcement learning (DRL) techniques, there is an increasing need to understand and interpret DRL policies. While recent research has developed explanation methods to interpret how an agent determines its moves, they cannot capture the importance of actions/states to a game's final result. In this work, we propose a novel self-explainable model that augments a Gaussian process with a customized kernel function and an interpretable predictor. Together with the proposed model, we also develop a parameter learning procedure that leverages inducing points and variational inference to improve learning efficiency. Using our proposed model, we can predict an agent's final rewards from its game episodes and extract time step importance within episodes as strategy-level explanations for that agent. Through experiments on Atari and MuJoCo games, we verify the explanation fidelity of our method and demonstrate how to employ interpretation to understand agent behavior, discover policy vulnerabilities, remediate policy errors, and even defend against adversarial attacks.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] DEEP REINFORCEMENT LEARNING FOR TRANSFER OF CONTROL POLICIES
    Cunningham, James D.
    Miller, Simon W.
    Yukish, Michael A.
    Simpson, Timothy W.
    Tucker, Conrad S.
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2019, VOL 2A, 2020,
  • [2] Verified Probabilistic Policies for Deep Reinforcement Learning
    Bacci, Edoardo
    Parker, David
    NASA FORMAL METHODS (NFM 2022), 2022, 13260 : 193 - 212
  • [3] Discovering symbolic policies with deep reinforcement learning
    Landajuela, Mikel
    Petersen, Brenden K.
    Kim, Sookyung
    Santiago, Claudio P.
    Glatt, Ruben
    Mundhenk, T. Nathan
    Pettit, Jacob F.
    Faissol, Daniel M.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] Deep Reinforcement Learning at the Edge of the Statistical Precipice
    Agarwal, Rishabh
    Schwarzer, Max
    Castro, Pablo Samuel
    Courville, Aaron
    Bellemare, Marc G.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] Explaining Black Box Reinforcement Learning Agents Through Counterfactual Policies
    Movin, Maria
    Dinis Junior, Guilherme
    Hollmen, Jaakko
    Papapetrou, Panagiotis
    ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023, 2023, 13876 : 314 - 326
  • [6] Learning Intention-Aware Policies in Deep Reinforcement Learning
    Zhao, T.
    Wu, S.
    Li, G.
    Chen, Y.
    Niu, G.
    Sugiyama, Masashi
    NEURAL COMPUTATION, 2023, 35 (10) : 1657 - 1677
  • [7] Learning Urban Driving Policies using Deep Reinforcement Learning
    Agarwal, Tanmay
    Arora, Hitesh
    Schneider, Jeff
    2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 607 - 614
  • [8] StateMask: Explaining Deep Reinforcement Learning through State Mask
    Cheng, Zelei
    Wu, Xian
    Yu, Jiahao
    Sun, Wenhai
    Guo, Wenbo
    Xing, Xinyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Reinforcement Learning with Deep Energy-Based Policies
    Haarnoja, Tuomas
    Tang, Haoran
    Abbeel, Pieter
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [10] Autoregressive Policies for Continuous Control Deep Reinforcement Learning
    Korenkevych, Dmytro
    Mahmood, A. Rupam
    Vasan, Gautham
    Bergstra, James
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2754 - 2762