Reward-Relevance-Filtered Linear Offline Reinforcement Learning

被引:0
作者
Zhou, Angela [1 ]
机构
[1] Univ Southern Calif, Data Sci & Operat, Los Angeles, CA 90007 USA
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies offline reinforcement learning with linear function approximation in a setting with decision-theoretic, but not estimation sparsity. The structural restrictions of the data-generating process presume that the transitions factor into a sparse component that affects the reward and could affect additional exogenous dynamics that do not affect the reward. Although the minimally sufficient adjustment set for estimation of full-state transition properties depends on the whole state, the optimal policy and therefore state-action value function depends only on the sparse component: we call this causal/decision-theoretic sparsity. We develop a method for reward-filtering the estimation of the state-action value function to the sparse component by a modification of thresholded lasso in least-squares policy evaluation. We provide theoretical guarantees for our rewardfiltered linear fitted-Q-iteration, with sample complexity depending only on the size of the sparse component.
引用
收藏
页数:17
相关论文
共 25 条
[1]  
[Anonymous], 2019, PMLR
[2]  
Ariu K, 2022, PR MACH LEARN RES, P878
[3]  
Belloni A., 2013, BERNOULLI
[4]  
Bühlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9
[5]  
Dietterich T, 2018, PR MACH LEARN RES, V80
[6]  
Duan Y., 2020, INT C MACHINE LEARNI, P2701
[7]  
Duan Yaqi, 2021, P MACHINE LEARNING R, V139
[8]  
Efroni Y., 2021, INT C LEARN REPR TAT
[9]  
Efroni Y., 2022, PMLR, P5851
[10]  
Ernst D, 2006, IEEE DECIS CONTR P, P669