Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control

被引:4
|
作者
Uragami, Daisuke [1 ]
Takahashi, Tatsuji [2 ]
Matsuo, Yoshiki [1 ]
机构
[1] Tokyo Univ Technol, Sch Comp Sci, Hachioji, Tokyo 1920982, Japan
[2] Tokyo Denki Univ, Sch Sci & Technol, Hiki, Saitama 3500394, Japan
基金
日本学术振兴会;
关键词
Q-learning; Exploration-exploitation dilemma; Bio-inspired computing; Cognitive bias; Loosely symmetric model; Acrobot; Multi-armed bandit problems; ACQUISITION; MODEL; BEHAVIOR; MAP;
D O I
10.1016/j.biosystems.2013.11.002
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many algorithms and methods in artificial intelligence or machine learning were inspired by human cognition. As a mechanism to handle the exploration-exploitation dilemma in reinforcement learning, the loosely symmetric (LS) value function that models causal intuition of humans was proposed (Shinohara et al., 2007). While LS shows the highest correlation with causal induction by humans, it has been reported that it effectively works in multi-armed bandit problems that form the simplest class of tasks representing the dilemma. However, the scope of application of LS was limited to the reinforcement learning problems that have K actions with only one state (K-armed bandit problems). This study proposes LS-Q learning architecture that can deal with general reinforcement learning tasks with multiple states and delayed reward. We tested the learning performance of the new architecture in giant-swing robot motion learning, where uncertainty and unknown-ness of the environment is huge. In the test, the help of ready-made internal models or functional approximation of the state space were not given. The simulations showed that while the ordinary Q-learning agent does not reach giant-swing motion because of stagnant loops (local optima with low rewards), LS-Q escapes such loops and acquires giant-swing. It is confirmed that the smaller number of states is, in other words, the more coarse-grained the division of states and the more incomplete the state observation is, the better LS-Q performs in comparison with Q-learning. We also showed that the high performance of LS-Q depends comparatively little on parameter tuning and learning time. This suggests that the proposed method inspired by human cognition works adaptively in real environments. (C) 2013 Elsevier Ireland Ltd. All rights reserved.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [41] Lateral Motion Control of a Maneuverable Aircraft Using Reinforcement Learning
    Yu. V. Tiumentsev
    R. A. Zarubin
    Optical Memory and Neural Networks, 2024, 33 : 1 - 12
  • [42] A Reinforcement Learning Method for Motion Control With Constraints on an HPN Arm
    Gan, Yinghao
    Li, Peijin
    Jiang, Hao
    Wang, Gaotian
    Jin, Yusong
    Chen, XiaoPing
    Ji, Jianmin
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 12006 - 12013
  • [43] Lateral Motion Control of a Maneuverable Aircraft Using Reinforcement Learning
    Tiumentsev, Yu. V.
    Zarubin, R. A.
    OPTICAL MEMORY AND NEURAL NETWORKS, 2024, 33 (01) : 1 - 12
  • [44] A new golf swing robot to simulate human skill-accuracy improvement of swing motion by learning control
    Ming, A
    Kajitani, M
    MECHATRONICS, 2003, 13 (8-9) : 809 - 823
  • [45] Research on Motion Control Method of Manipulator Based on Reinforcement Learning
    Yang, Bo
    Wang, Kun
    Ma, Xiangxiang
    Fan, Biao
    Xu, Lei
    Yan, Hao
    Computer Engineering and Applications, 2023, 59 (06) : 318 - 325
  • [46] Concepts and facilities of a neural reinforcement learning control architecture for technical process control
    Riedmiller, M
    NEURAL COMPUTING & APPLICATIONS, 1999, 8 (04): : 323 - 338
  • [47] Concepts and Facilities of a Neural Reinforcement Learning Control Architecture for Technical Process Control
    M Riedmiller
    Neural Computing & Applications, 1999, 8 : 323 - 338
  • [48] A general motion control architecture for an autonomous underwater vehicle with actuator faults and unknown disturbances through deep reinforcement learning
    Huang, Fei
    Xu, Jian
    Yin, Liangang
    Wu, Di
    Cui, Yunfei
    Yan, Zheping
    Chen, Tao
    OCEAN ENGINEERING, 2022, 263
  • [49] Analysis of Inertial Motion in Swing Phase of Human Gait and Its Application to Motion Generation of Transfemoral Prosthesis
    Wada, Takahiro
    Sano, Hiroshi
    Sekimoto, Masahiro
    2014 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2014), 2014, : 2075 - 2080
  • [50] Brain Affective System Inspired Control Architecture: An Application to Nonlinear System
    Qutubuddin, Md
    Gibo, Tilahun Kochito
    Bapi, Raju S.
    Narri, Yadaiah
    IEEE ACCESS, 2021, 9 : 86565 - 86580