A temporal-difference learning method using gaussian state representation for continuous state space problems

被引:0
|
作者
机构
[1] Graduate School of Engineering, Osaka City University
来源
| 1600年 / Japanese Society for Artificial Intelligence卷 / 29期
关键词
Continuous state spaces; Gaussian state representation; Reinforcement learning; TD learning;
D O I
10.1527/tjsai.29.157
中图分类号
学科分类号
摘要
In this paper, we tackle the problem of reinforcement learning (RL) in a continuous state space. An appropriate discretization of the space can make many learning tasks tractable. A method using Gaussian state representation and the Rational Policy Making algorithm (RPM) has been proposed for this problem. This method discretizes the space by constructing a chain of states which represents a path to the goal of the agent exploiting past experiences of reaching it. This method exploits successful experiences strongly. Therefore, it can find a rational solution quickly in an environment with few noises. In a noisy environment, it makes many unnecessary and distractive states and does the task poorly. For learning in such an environment, we have introduced the concept of the value of a state to the above method and developed a new method. This method uses a temporal-difference (TD) learning algorithm for learning the values of states. The value of a state is used to determine the size of the state. Thus, our developed method can trim and eliminate unnecessary and distractive states quickly and learn the task well even in a noisy environment. We show the effectiveness of our method by computer simulations of a path finding task and a cart-pole swing-up task. © The Japanese Society for Artificial Intelligence 2014.
引用
收藏
页码:157 / 167
页数:10
相关论文
共 50 条
  • [31] Reinforcement distribution in continuous state action space fuzzy Q-learning: A novel approach
    Bonarini, A
    Montrone, F
    Restelli, M
    FUZZY LOGIC AND APPLICATIONS, 2006, 3849 : 40 - 45
  • [32] Multi-Robot Cooperation Based on Continuous Reinforcement Learning with Two State Space Representations
    Yasuda, Toshiyuki
    Ohkura, Kazuhiro
    Yamada, Kazuaki
    2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 4470 - 4475
  • [33] Automated synthesis of steady-state continuous processes using reinforcement learning
    Quirin Göttl
    Dominik G. Grimm
    Jakob Burger
    Frontiers of Chemical Science and Engineering, 2022, 16 : 288 - 302
  • [34] Automated synthesis of steady-state continuous processes using reinforcement learning
    Gttl Quirin
    Grimm Dominik G
    Burger Jakob
    Frontiers of Chemical Science and Engineering, 2022, 16 (02) : 288 - 302
  • [35] Q-Learning in Continuous State-Action Space with Noisy and Redundant Inputs by Using a Selective Desensitization Neural Network
    Kobayashi, Takaaki
    Shibuya, Takeshi
    Morita, Masahiko
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2015, 19 (06) : 825 - 832
  • [36] Automated synthesis of steady-state continuous processes using reinforcement learning
    Goettl, Quirin
    Grimm, Dominik G.
    Burger, Jakob
    FRONTIERS OF CHEMICAL SCIENCE AND ENGINEERING, 2022, 16 (02) : 288 - 302
  • [37] A Novel State Space Exploration Method for the Sparse-Reward Reinforcement Learning Environment
    Liu, Xi
    Ma, Long
    Chen, Zhen
    Zheng, Changgang
    Chen, Ren
    Liao, Yong
    Yang, Shufan
    ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 216 - 221
  • [38] An adjustment method of the number of states on Q-Learning segmenting state space adaptively
    Hamagami, Tornoki
    Koakutsu, Sefichi
    Hirata, Hironori
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2007, 90 (09): : 75 - 86
  • [39] An adjustment method of the number of states on Q-learning segmenting state space adaptively
    Hamagami, T
    Hirata, H
    2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 3062 - 3067
  • [40] Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method
    Lee, Donghwan
    Kim, Do Wan
    Hu, Jianghai
    IEEE ACCESS, 2022, 10 : 107077 - 107094