INEXPERIENCED RL AGENTS CAN'T GET IT RIGHT: LOWER BOUNDS ON REGRET AT FINITE SAMPLE COMPLEXITY

被引:0
|
作者
Fraser, Maia [1 ]
Letourneau, Vincent [1 ]
机构
[1] Univ Ottawa, Dept Comp Sci, Ottawa, ON, Canada
来源
CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199 | 2022年 / 199卷
关键词
MDPS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider a family M of MDPs over given state and action spaces, and an agent that is sequentially confronted with tasks from M. Although stated for this stepwise change in distributions, the insight we develop is informative for continually changing distributions as well. In order to study how structure of M, viewed as a learning environment, impacts the learning efficiency of the agent, we formulate an RL analog of fat shattering dimension for MDP families and show that this implies a nontrivial lower bound on regret as long as insufficiently many steps have been taken. More precisely, for some constant c which depends on shattering d states, an inexperienced agent that has explored the learning environment for fewer than d steps will necessarily have regret above c on some MDP in the family.
引用
收藏
页数:8
相关论文
共 1 条
  • [1] Tight Query Complexity Lower Bounds for PCA via Finite Sample Deformed Wigner Law
    Simchowitz, Max
    El Alaoui, Ahmed
    Recht, Benjamin
    STOC'18: PROCEEDINGS OF THE 50TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2018, : 1249 - 1259