A temporal-difference learning method using gaussian state representation for continuous state space problems

被引:0
|
作者
机构
[1] Graduate School of Engineering, Osaka City University
来源
| 1600年 / Japanese Society for Artificial Intelligence卷 / 29期
关键词
Continuous state spaces; Gaussian state representation; Reinforcement learning; TD learning;
D O I
10.1527/tjsai.29.157
中图分类号
学科分类号
摘要
In this paper, we tackle the problem of reinforcement learning (RL) in a continuous state space. An appropriate discretization of the space can make many learning tasks tractable. A method using Gaussian state representation and the Rational Policy Making algorithm (RPM) has been proposed for this problem. This method discretizes the space by constructing a chain of states which represents a path to the goal of the agent exploiting past experiences of reaching it. This method exploits successful experiences strongly. Therefore, it can find a rational solution quickly in an environment with few noises. In a noisy environment, it makes many unnecessary and distractive states and does the task poorly. For learning in such an environment, we have introduced the concept of the value of a state to the above method and developed a new method. This method uses a temporal-difference (TD) learning algorithm for learning the values of states. The value of a state is used to determine the size of the state. Thus, our developed method can trim and eliminate unnecessary and distractive states quickly and learn the task well even in a noisy environment. We show the effectiveness of our method by computer simulations of a path finding task and a cart-pole swing-up task. © The Japanese Society for Artificial Intelligence 2014.
引用
收藏
页码:157 / 167
页数:10
相关论文
共 50 条
  • [1] Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach
    Jia, Yanwei
    Zhou, Xun Yu
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [2] Using temporal-difference learning for multi-agent bargaining
    Huang, Shiu-li
    Lin, Fu-ren
    ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2008, 7 (04) : 432 - 442
  • [3] Striatal and Tegmental Neurons Code Critical Signals for Temporal-Difference Learning of State Value in Domestic Chicks
    Wen, Chentao
    Ogura, Yukiko
    Matsushima, Toshiya
    FRONTIERS IN NEUROSCIENCE, 2016, 10
  • [4] IMPROVING REINFORCEMENT LEARNING USING TEMPORAL-DIFFERENCE NETWORK EUROCON2009
    Karbasian, Habib
    Ahmadabadi, Majid N.
    Araabi, Babak N.
    EUROCON 2009: INTERNATIONAL IEEE CONFERENCE DEVOTED TO THE 150 ANNIVERSARY OF ALEXANDER S. POPOV, VOLS 1- 4, PROCEEDINGS, 2009, : 1716 - 1722
  • [5] Swarm Reinforcement Learning Methods for Problems with Continuous State-Action Space
    Iima, Hitoshi
    Kuroe, Yasuaki
    Emoto, Kazuo
    2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 2173 - 2180
  • [6] A state space filter for reinforcement learning in POMDPs - Application to a continuous state space -
    Nagayoshi, Masato
    Murao, Hajime
    Tamaki, Hisashi
    2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 3098 - +
  • [7] Gradient temporal-difference learning for off-policy evaluation using emphatic weightings
    Cao, Jiaqing
    Liu, Quan
    Zhu, Fei
    Fu, Qiming
    Zhong, Shan
    INFORMATION SCIENCES, 2021, 580 : 311 - 330
  • [8] Particle swarm optimization based on temporal-difference learning for solving multi-objective optimization problems
    Zhang, Desong
    Zhu, Guangyu
    COMPUTING, 2023, 105 (08) : 1795 - 1820
  • [9] Particle swarm optimization based on temporal-difference learning for solving multi-objective optimization problems
    Desong Zhang
    Guangyu Zhu
    Computing, 2023, 105 : 1795 - 1820
  • [10] Reinforcement Learning Method for Continuous State Space Based on Dynamic Neural Network
    Sun, Wei
    Wang, Xuesong
    Cheng, Yuhu
    2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, : 750 - 754