Convergence of a Q-learning Variant for Continuous States and Actions

被引：5

作者：

Carden, Stephen ^{[1
]}

机构：

[1] Clemson Univ, Dept Math Sci, Clemson, SC 29631 USA

来源：

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2014年 / 49卷

关键词：

D O I：

10.1613/jair.4271

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision Processes under the expected total discounted reward criterion when both the state and action spaces are continuous. This algorithm is based on Watkins' Q-learning, but uses Nadaraya-Watson kernel smoothing to generalize knowledge to unvisited states. As expected, continuity conditions must be imposed on the mean rewards and transition probabilities. Using results from kernel regression theory, this algorithm is proven capable of producing a Q-value function estimate that is uniformly within an arbitrary tolerance of the true Q-value function with probability one. The algorithm is then applied to an example problem to empirically show convergence as well.

引用

页码：705 / 731

页数：27

共 50 条

[21] Faster Non-asymptotic Convergence for Double Q-learning
Zhao, Lin
Xiong, Huaqing
Liang, Yingbin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[22] On Q-learning Convergence for Non-Markov Decision Processes
Majeed, Sultan Javed
Hutter, Marcus
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2546 - 2552
[23] Q-LEARNING
WATKINS, CJCH
DAYAN, P
MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
[24] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[25] An algorithm that excavates suboptimal states and improves Q-learning
Zhu, Canxin
Yang, Jingmin
Zhang, Wenjie
Zheng, Yifeng
ENGINEERING RESEARCH EXPRESS, 2024, 6 (04):
[26] Induced states in a decision tree constructed by Q-learning
Hwang, Kao-Shing
Chen, Yu-Jen
Jiang, Wei-Cheng
Yang, Tsung-Wen
INFORMATION SCIENCES, 2012, 213 : 39 - 49
[27] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
Wang, Yin-Hao
Li, Tzuu-Hseng S.
Lin, Chih-Jui
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
[28] Finite-sample convergence rates for Q-learning and indirect algorithms
Kearns, M
Singh, S
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 11, 1999, 11 : 996 - 1002
[29] Safe Q-learning for continuous-time linear systems
Bandyopadhyay, Soutrik
Bhasin, Shubhendu
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 241 - 246
[30] Enhanced continuous valued Q-learning for real autonomous robots
Takeda, M
Nakamura, T
Imai, M
Ogasawara, T
Asada, M
ADVANCED ROBOTICS, 2000, 14 (05) : 439 - 441

← 1 2 3 4 5 →