Model-based Reinforcement Learning and the Eluder Dimension

被引：0

作者：

Osband, Ian ^{[1
]}

Van Roy, Benjamin ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of learning to optimize an unknown Markov decision process (MDP). We show that, if the MDP can be parameterized within some known function class, we can obtain regret bounds that scale with the dimensionality, rather than cardinality, of the system. We characterize this dependence explicitly as (O) over tilde(root d(K)d(E)T) where T is time elapsed, d(K) is the Kolmogorov dimension and d(E) is the eluder dimension. These represent the first unified regret bounds for model-based reinforcement learning and provide state of the art guarantees in several important settings. More-over, we present a simple and computationally efficient algorithm posterior sampling for reinforcement learning (PSRL) that satisfies these bounds.

引用

页数：9

共 50 条

[41] Model-based reinforcement learning with model error and its application [J].

Tajima, Yoshiyuki ;

Onisawa, Takehisa .

PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-8, 2007, :1333-1336

[42] Model-based reinforcement learning: a computational model and an fMRI study [J].

Yoshida, W ;

Ishii, S .

NEUROCOMPUTING, 2005, 63 :253-269

[43] Reinforcement Twinning: From digital twins to model-based reinforcement learning [J].

Schena, Lorenzo ;

Marques, Pedro A. ;

Poletti, Romain ;

Van den Berghe, Jan ;

Mendez, Miguel A. .

JOURNAL OF COMPUTATIONAL SCIENCE, 2024, 82

[44] Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal [J].

Agarwal, Alekh ;

Kakade, Sham ;

Yang, Lin F. .

CONFERENCE ON LEARNING THEORY, VOL 125, 2020, 125

[45] Model-based reinforcement learning under concurrent schedules of reinforcement in rodents [J].

Huh, Namjung ;

Jo, Suhyun ;

Kim, Hoseok ;

Sul, Jung Hoon ;

Jung, Min Whan .

LEARNING & MEMORY, 2009, 16 (05) :315-323

[46] Reward Shaping for Model-Based Bayesian Reinforcement Learning [J].

Kim, Hyeoneun ;

Lim, Woosang ;

Lee, Kanghoon ;

Noh, Yung-Kyun ;

Kim, Kee-Eung .

PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, :3548-3555

[47] Model-based Adversarial Meta-Reinforcement Learning [J].

Lin, Zichuan ;

Thomas, Garrett ;

Yang, Guangwen ;

Ma, Tengyu .

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33

[48] On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning [J].

Zhang, Baohe ;

Rajan, Raghu ;

Pineda, Luis ;

Lambert, Nathan ;

Biedenkapp, Andre ;

Chua, Kurtland ;

Hutter, Frank ;

Calandra, Roberto .

24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130

[49] Model-based reinforcement learning for approximate optimal regulation [J].

Kamalapurkar, Rushikesh ;

Walters, Patrick ;

Dixon, Warren E. .

AUTOMATICA, 2016, 64 :94-104

[50] Model-based Bayesian Reinforcement Learning for Dialogue Management [J].

Lison, Pierre .

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, :475-479

← 1 2 3 4 5 →