Q-learning with Nearest Neighbors

被引：0

作者：

Shah, Devavrat ^{[1
,2
]}

Xie, Qiaomin ^{[1
]}

机构：

[1] MIT, LIDS, Cambridge, MA 02139 USA

[2] MIT, Stat & Data Sci Ctr, Dept EECS, Cambridge, MA 02139 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷

关键词：

MARKOV DECISION-PROCESSES; CONVERGENCE; APPROXIMATION; TREES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a d-dimensional state space and the discounted factor gamma is an element of (0, 1), given an arbitrary sample path with "covering time" L, we establish that the algorithm is guaranteed to output an E-accurate estimate of the optimal Q-function using (O) over tilde (L/(epsilon(3)(1 - gamma)(7))) samples. For instance, for a wellbehaved MDP, the covering time of the sample path under the purely random policy scales as (O) over tilde (1/epsilon(d)), so the sample complexity scales as (O) over tilde (1/epsilon(d+3)). Indeed, we establish a lower bound that argues that the dependence of (Omega) over tilde (1/epsilon(d+2)) is necessary.

引用

页数：11

共 50 条

[21] A Q-Learning Approach for Adherence-Aware Recommendations [J].

Faros, Ioannis ;

Dave, Aditya ;

Malikopoulos, Andreas A. .

IEEE CONTROL SYSTEMS LETTERS, 2023, 7 :3645-3650

[22] Finite-Time Theory for Momentum Q-learning [J].

Weng, Bowen ;

Xiong, Huaqing ;

Zhao, Lin ;

Liang, Yingbin ;

Zhang, Wei .

UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 161, 2021, 161 :665-674

[23] Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning [J].

Li, Gen ;

Cai, Changxiao ;

Chen, Yuxin ;

Gu, Yuantao ;

Wei, Yuting ;

Chi, Yuejie .

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139

[24] INTERNALLY DRIVEN Q-LEARNING Convergence and Generalization Results [J].

Alonso, Eduardo ;

Mondragon, Esther ;

Kjaell-Ohlsson, Niclas .

ICAART: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2012, :491-494

[25] Asymptotic Analysis of Sample-Averaged Q-Learning [J].

Panda, Saunak Kumar ;

Liu, Ruiqi ;

Xiang, Yisha .

IEEE TRANSACTIONS ON INFORMATION THEORY, 2025, 71 (07) :5601-5619

[26] Bias-Corrected Q-Learning With Multistate Extension [J].

Lee, Donghun ;

Powell, Warren B. .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (10) :4011-4023

[27] Suppressing Overestimation in Q-Learning through Adversarial Behaviors [J].

Lee, HyeAnn ;

Lee, Donghwan .

2024 60TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, ALLERTON 2024, 2024,

[28] Feature Extraction in Q-Learning using Neural Networks [J].

Zhu, Henghui ;

Paschalidis, Ioannis Ch. ;

Hasselmo, Michael E. .

2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,

[29] On the Mutual Nearest Neighbors Estimate in Regression [J].

Guyader, Arnaud ;

Hengartner, Nick .

JOURNAL OF MACHINE LEARNING RESEARCH, 2013, 14 :2361-2376

[30] Active Nearest Neighbors in Changing Environments [J].

Berlind, Christopher ;

Urner, Ruth .

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 :1870-1879

← 1 2 3 4 5 →