Q-learning in Continuous State-Action Space with Redundant Dimensions by Using a Selective Desensitization Neural Network

被引:0
作者
Kobayashi, Takaaki [1 ]
Shibuya, Takeshi [2 ]
Morita, Masahiko [2 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki 3058573, Japan
[2] Univ Tsukuba, Fac Engn Informat & Syst, Tsukuba, Ibaraki 3058573, Japan
来源
2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS) | 2014年
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When applying reinforcement learning algorithms such as Q-learning to real world problems, we must consider the high and redundant dimensions and continuity of the state-action space. The continuity of state-action space is often treated by value function approximation. However, conventional function approximators such as radial basis function networks (RBFNs) are unsuitable in these environments, because they incur high computational cost, and the number of required experiences grows exponentially with the dimension of the state-action space. By contrast, selective desensitization neural network (SDNN) is highly robust to redundant inputs and computes at low computational cost. This paper proposes a novel function approximation method for Q-learning in continuous state-action space based on SDNN. The proposed method is evaluated by numerical experiments with redundant input(s). These experimental results validate the robustness of the proposed method to redundant state dimensions, and its lower computational cost than RBFN. These properties are advantageous to real-world applications such as robotic systems.
引用
收藏
页码:801 / 806
页数:6
相关论文
共 9 条
  • [1] [Anonymous], 1989, (Ph.D. thesis
  • [2] Gaskett C, 1999, LECT NOTES ARTIF INT, V1747, P417
  • [3] Geramifard A, 2006, P 21 C ART INT BOST, P356
  • [4] HORIE K, 2014, J SIGNAL PROCESS, V18, P225
  • [5] Konidaris George, 2011, P AAAI C ARTIFICIAL, P380
  • [6] Universal Approximation Using Radial-Basis-Function Networks
    Park, J.
    Sandberg, I. W.
    [J]. NEURAL COMPUTATION, 1991, 3 (02) : 246 - 257
  • [7] THE SWING UP CONTROL PROBLEM FOR THE ACROBOT
    SPONG, MW
    [J]. IEEE CONTROL SYSTEMS MAGAZINE, 1995, 15 (01): : 49 - 55
  • [8] Sutton R.S., 2017, Introduction to reinforcement learning
  • [9] WATKINS CJCH, 1992, MACH LEARN, V8, P279, DOI 10.1007/BF00992698