INEXPERIENCED RL AGENTS CAN'T GET IT RIGHT: LOWER BOUNDS ON REGRET AT FINITE SAMPLE COMPLEXITY

被引：0

作者：

Fraser, Maia ^{[1
]}

Letourneau, Vincent ^{[1
]}

机构：

[1] Univ Ottawa, Dept Comp Sci, Ottawa, ON, Canada

来源：

CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199 | 2022年 / 199卷

关键词：

MDPS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider a family M of MDPs over given state and action spaces, and an agent that is sequentially confronted with tasks from M. Although stated for this stepwise change in distributions, the insight we develop is informative for continually changing distributions as well. In order to study how structure of M, viewed as a learning environment, impacts the learning efficiency of the agent, we formulate an RL analog of fat shattering dimension for MDP families and show that this implies a nontrivial lower bound on regret as long as insufficiently many steps have been taken. More precisely, for some constant c which depends on shattering d states, an inexperienced agent that has explored the learning environment for fewer than d steps will necessarily have regret above c on some MDP in the family.

引用

页数：8