PROJECTED STATE-ACTION BALANCING WEIGHTS FOR OFFLINE REINFORCEMENT LEARNING

被引：1

作者：

Wang, Jiayi ^{[1
]}

Qi, Zhengling ^{[2
]}

Wong, Raymond K. W. ^{[3
]}

机构：

[1] Univ Texas Dallas, Dept Math Sci, Richardson, TX 75083 USA

[2] George Washington Univ, Dept Decis Sci, Washington, DC 20052 USA

[3] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA

来源：

ANNALS OF STATISTICS | 2023年 / 51卷 / 04期

基金：

美国国家科学基金会;

关键词：

Infinite horizons; Markov decision process; Policy evaluation; Reinforcement learning; DYNAMIC TREATMENT REGIMES; RATES; CONVERGENCE; INFERENCE;

D O I：

10.1214/23-AOS2302

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Off-policy evaluation is considered a fundamental and challenging problem in reinforcement learning (RL). This paper focuses on value estimation of a target policy based on pre-collected data generated from a possibly different policy, under the framework of infinite-horizon Markov decision processes. Motivated by the recently developed marginal importance sampling method in RL and the covariate balancing idea in causal inference, we propose a novel estimator with approximately projected state-action balancing weights for the policy value estimation. We obtain the convergence rate of these weights, and show that the proposed value estimator is asymptotically normal under technical conditions. In terms of asymptotics, our results scale with both the number of trajectories and the number of decision points at each trajectory. As such, consistency can still be achieved with a limited number of subjects when the number of decision points diverges. In addition, we develop a necessary and sufficient condition for establishing the well-posedness of the operator that relates to the nonparametric Q-function estimation in the off-policy setting, which characterizes the difficulty of Q-function estimation and may be of independent interest. Numerical experiments demonstrate the promising performance of our proposed estimator.

引用

页码：1639 / 1665

页数：27

共 50 条

[41] REINFORCEMENT LEARNING WITH RESTRICTIONS ON THE ACTION SET
Bravo, Mario
Faure, Mathieu
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2015, 53 (01) : 287 - 312
[42] Experiments with reinforcement learning in problems with continuous state and action spaces
Santamaria, JC
Sutton, RS
Ram, A
ADAPTIVE BEHAVIOR, 1997, 6 (02) : 163 - 217
[43] Learning Personalized Health Recommendations via Offline Reinforcement Learning
Preuett, Larry
PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, : 1355 - 1357
[44] Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
Rashidinejad, Paria
Zhu, Banghua
Ma, Cong
Jiao, Jiantao
Russell, Stuart
IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (12) : 8156 - 8196
[45] Learning a Model-Free Robotic Continuous State-Action Task through Contractive Q-Network
Davari, Mohammadjavad
Alipour, Khalil
Hadi, Alireza
Tarvirdizadeh, Bahram
2017 ARTIFICIAL INTELLIGENCE AND ROBOTICS (IRANOPEN), 2017, : 115 - 120
[46] Doubly constrained offline reinforcement learning for learning path recommendation
Yun, Yue
Dai, Huan
An, Rui
Zhang, Yupei
Shang, Xuequn
KNOWLEDGE-BASED SYSTEMS, 2024, 284
[47] Low-rank State-action Value-function Approximation
Rozada, Sergio
Tenorio, Victor
Marques, Antonio G.
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1471 - 1475
[48] Estimation of the Change of Agents Behavior Strategy Using State-Action History
Uchida, Shihori
Oba, Sigeyuki
Ishii, Shin
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II, 2017, 10614 : 100 - 107
[49] Compressive Features in Offline Reinforcement Learning for Recommender Systems
Minh Pham
Hung Nguyen
Long Dang
Nieves, Jennifer Adorno
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5719 - 5726
[50] Integrating Offline Reinforcement Learning with Transformers for Sequential Recommendation
Xi, Xumei
Zhao, Yuke
Liu, Quan
Ouyang, Liwen
Wu, Yang
PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1103 - 1108

← 1 2 3 4 5 →