Multi-timescale nexting in a reinforcement learning robot

被引:37
|
作者
Modayil, Joseph [1 ]
White, Adam [1 ]
Sutton, Richard S. [1 ]
机构
[1] Univ Alberta, Reinforcement Learning & Artificial Intelligence, Edmonton, AB, Canada
关键词
Reinforcement learning; robotics; predictive knowledge; temporal difference learning; FUTURE;
D O I
10.1177/1059712313511648
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The term nexting' has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense. The ability to next' constitutes a basic kind of awareness and knowledge of one's environment. In this paper we present results with a robot that learns to next in real time, making thousands of predictions about sensory input signals at timescales from 0.1 to 8 seconds. Our predictions are formulated as a generalization of the value functions commonly used in reinforcement learning, where now an arbitrary function of the sensory input signals is used as a pseudo reward, and the discount rate determines the timescale. We show that six thousand predictions, each computed as a function of six thousand features of the state, can be learned and updated online ten times per second on a laptop computer, using the standard temporal-difference() algorithm with linear function approximation. This approach is sufficiently computationally efficient to be used for real-time learning on the robot and sufficiently data efficient to achieve substantial accuracy within 30 minutes. Moreover, a single tile-coded feature representation suffices to accurately predict many different signals over a significant range of timescales. We also extend nexting beyond simple timescales by letting the discount rate be a function of the state and show that nexting predictions of this more general form can also be learned with substantial accuracy. General nexting provides a simple yet powerful mechanism for a robot to acquire predictive knowledge of the dynamics of its environment.
引用
收藏
页码:146 / 160
页数:15
相关论文
共 50 条
  • [21] Multi-timescale neural dynamics for multisensory integration
    Senkowski, Daniel
    Engel, Andreas K.
    NATURE REVIEWS NEUROSCIENCE, 2024, 25 (09) : 625 - 642
  • [22] Internal structuring of silicon with multi-timescale irradiations
    Das, Amlan
    Wang, Andong
    Uteza, Olivier
    Grojo, David
    2021 CONFERENCE ON LASERS AND ELECTRO-OPTICS (CLEO), 2021,
  • [23] Multi-timescale solar cycles and the possible implications
    Baolin Tan
    Astrophysics and Space Science, 2011, 332 : 65 - 72
  • [24] Few-Shot Learning in Spiking Neural Networks by Multi-Timescale Optimization
    Jiang, Runhao
    Zhang, Jie
    Yan, Rui
    Tang, Huajin
    NEURAL COMPUTATION, 2021, 33 (09) : 2439 - 2472
  • [25] A multi-algorithm, multi-timescale method for cell simulation
    Takahashi, K
    Kaizu, K
    Hu, B
    Tomita, M
    BIOINFORMATICS, 2004, 20 (04) : 538 - 546
  • [26] Temporal dendritic heterogeneity incorporated with spiking neural networks for learning multi-timescale dynamics
    Zheng, Hanle
    Zheng, Zhong
    Hu, Rui
    Xiao, Bo
    Wu, Yujie
    Yu, Fangwen
    Liu, Xue
    Li, Guoqi
    Deng, Lei
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [27] A Multi-timescale Prediction Model of IGBT Junction Temperature
    Liu, Binli
    Xiao, Fei
    Luo, Yifei
    Huang, Yongle
    Xiong, Youxing
    IEEE JOURNAL OF EMERGING AND SELECTED TOPICS IN POWER ELECTRONICS, 2019, 7 (03) : 1593 - 1603
  • [28] Multi-timescale Performance of Groundwater Drought in Connection with Climate
    Ruirui Zhu
    Hongxing Zheng
    Anthony J. Jakeman
    Francis H.S. Chiew
    Water Resources Management, 2023, 37 : 3599 - 3614
  • [29] Multi-Timescale Ensemble Q-Learning for Markov Decision Process Policy Optimization
    Bozkus, Talha
    Mitra, Urbashi
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 1427 - 1442
  • [30] On the Question of Decomposition of Multi-timescale Systems Dynamic Models
    Derzhavin, Otto
    Zhelbakov, Igor
    Sidorova, Elena
    Grout, Vic
    PROCEEDINGS OF THE 2017 7TH INTERNATIONAL CONFERENCE INTERNET TECHNOLOGIES AND APPLICATIONS (ITA), 2017, : 165 - 168