Hardness in Markov Decision Processes: Theory and Practice

被引:0
作者
Conserva, Michelangelo [1 ]
Rauber, Paulo [1 ]
机构
[1] Queen Mary Univ London, London, England
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Meticulously analysing the empirical strengths and weaknesses of reinforcement learning methods in hard (challenging) environments is essential to inspire innovations and assess progress in the field. In tabular reinforcement learning, there is no well-established standard selection of environments to conduct such analysis, which is partially due to the lack of a widespread understanding of the rich theory of hardness of environments. The goal of this paper is to unlock the practical usefulness of this theory through four main contributions. First, we present a systematic survey of the theory of hardness, which also identifies promising research directions. Second, we introduce Colosseum, a pioneering package that enables empirical hardness analysis and implements a principled benchmark composed of environments that are diverse with respect to different measures of hardness. Third, we present an empirical analysis that provides new insights into computable measures. Finally, we benchmark five tabular agents in our newly proposed benchmark. While advancing the theoretical understanding of hardness in non-tabular reinforcement learning remains essential, our contributions in the tabular setting are intended as solid steps towards a principled non-tabular benchmark. Accordingly, we benchmark four agents in non-tabular versions of Colosseum environments, obtaining results that demonstrate the generality of tabular hardness measures.
引用
收藏
页数:15
相关论文
共 35 条
[1]  
Agarwal A., 2021, The Journal of Machine Learning Research
[2]  
Agrawal S., 2017, Advances in Neural Information Processing Systems NeurIPS
[3]  
[Anonymous], 2006, Advances in Neural Information Processing Systems
[4]  
Bartlett Peter L, 2009, Uncertainty in Artificial Intelligence
[5]   A MARKOVIAN DECISION PROCESS [J].
BELLMAN, R .
JOURNAL OF MATHEMATICS AND MECHANICS, 1957, 6 (05) :679-684
[6]  
Berner C, 2019, DOTA 2 LARGE SCALE D
[7]  
Chevalier-Boisvert M., 2018, Minimalistic Gridworld environment for OpenAI gym
[8]  
Dann C, 2015, ADV NEURAL INFORM PR, V28, P2818
[9]   Hierarchical reinforcement learning with the MAXQ value function decomposition [J].
Dietterich, TG .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 13 :227-303
[10]  
Du S, 2019, INT C MACHINE LEARNI, P1665