Convergence of a Q-learning Variant for Continuous States and Actions

被引:5
|
作者
Carden, Stephen [1 ]
机构
[1] Clemson Univ, Dept Math Sci, Clemson, SC 29631 USA
来源
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2014年 / 49卷
关键词
D O I
10.1613/jair.4271
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision Processes under the expected total discounted reward criterion when both the state and action spaces are continuous. This algorithm is based on Watkins' Q-learning, but uses Nadaraya-Watson kernel smoothing to generalize knowledge to unvisited states. As expected, continuity conditions must be imposed on the mean rewards and transition probabilities. Using results from kernel regression theory, this algorithm is proven capable of producing a Q-value function estimate that is uniformly within an arbitrary tolerance of the true Q-value function with probability one. The algorithm is then applied to an example problem to empirically show convergence as well.
引用
收藏
页码:705 / 731
页数:27
相关论文
共 50 条
  • [21] Faster Non-asymptotic Convergence for Double Q-learning
    Zhao, Lin
    Xiong, Huaqing
    Liang, Yingbin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [22] On Q-learning Convergence for Non-Markov Decision Processes
    Majeed, Sultan Javed
    Hutter, Marcus
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2546 - 2552
  • [23] Q-LEARNING
    WATKINS, CJCH
    DAYAN, P
    MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
  • [24] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [25] An algorithm that excavates suboptimal states and improves Q-learning
    Zhu, Canxin
    Yang, Jingmin
    Zhang, Wenjie
    Zheng, Yifeng
    ENGINEERING RESEARCH EXPRESS, 2024, 6 (04):
  • [26] Induced states in a decision tree constructed by Q-learning
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Jiang, Wei-Cheng
    Yang, Tsung-Wen
    INFORMATION SCIENCES, 2012, 213 : 39 - 49
  • [27] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [28] Finite-sample convergence rates for Q-learning and indirect algorithms
    Kearns, M
    Singh, S
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 11, 1999, 11 : 996 - 1002
  • [29] Safe Q-learning for continuous-time linear systems
    Bandyopadhyay, Soutrik
    Bhasin, Shubhendu
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 241 - 246
  • [30] Enhanced continuous valued Q-learning for real autonomous robots
    Takeda, M
    Nakamura, T
    Imai, M
    Ogasawara, T
    Asada, M
    ADVANCED ROBOTICS, 2000, 14 (05) : 439 - 441