Multi-timescale nexting in a reinforcement learning robot

被引:37
|
作者
Modayil, Joseph [1 ]
White, Adam [1 ]
Sutton, Richard S. [1 ]
机构
[1] Univ Alberta, Reinforcement Learning & Artificial Intelligence, Edmonton, AB, Canada
关键词
Reinforcement learning; robotics; predictive knowledge; temporal difference learning; FUTURE;
D O I
10.1177/1059712313511648
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The term nexting' has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense. The ability to next' constitutes a basic kind of awareness and knowledge of one's environment. In this paper we present results with a robot that learns to next in real time, making thousands of predictions about sensory input signals at timescales from 0.1 to 8 seconds. Our predictions are formulated as a generalization of the value functions commonly used in reinforcement learning, where now an arbitrary function of the sensory input signals is used as a pseudo reward, and the discount rate determines the timescale. We show that six thousand predictions, each computed as a function of six thousand features of the state, can be learned and updated online ten times per second on a laptop computer, using the standard temporal-difference() algorithm with linear function approximation. This approach is sufficiently computationally efficient to be used for real-time learning on the robot and sufficiently data efficient to achieve substantial accuracy within 30 minutes. Moreover, a single tile-coded feature representation suffices to accurately predict many different signals over a significant range of timescales. We also extend nexting beyond simple timescales by letting the discount rate be a function of the state and show that nexting predictions of this more general form can also be learned with substantial accuracy. General nexting provides a simple yet powerful mechanism for a robot to acquire predictive knowledge of the dynamics of its environment.
引用
收藏
页码:146 / 160
页数:15
相关论文
共 50 条
  • [1] Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning
    Emami, Patrick
    Zhang, Xiangyu
    Biagioni, David
    Zamzam, Ahmed S.
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 2372 - 2378
  • [2] Multi-timescale Deep Reinforcement Learning for Reactive Power Optimization of Distribution Network
    Hu D.
    Peng Y.
    Wei W.
    Xiao T.
    Cai T.
    Xi W.
    Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2022, 42 (14): : 5034 - 5044
  • [3] Dynamic Lane Traffic Signal Control with Group Attention and Multi-Timescale Reinforcement Learning
    Jiang, Qize
    Li, Jingze
    Sun, Weiwei
    Zheng, Baihua
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3642 - 3648
  • [4] EdgeTimer: Adaptive Multi-Timescale Scheduling in Mobile Edge Computing with Deep Reinforcement Learning
    Hao, Yijun
    Yang, Shusen
    Li, Fang
    Zhang, Yifan
    Wang, Shibo
    Ren, Xuebin
    IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2024, : 671 - 680
  • [5] Multi-timescale voltage control for distribution system based on multi-agent deep reinforcement learning
    Wu, Zhi
    Li, Yiqi
    Gu, Wei
    Dong, Zengbo
    Zhao, Jingtao
    Liu, Weiliang
    Zhang, Xiao-Ping
    Liu, Pengxiang
    Sun, Qirun
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2023, 147
  • [6] Reinforcement Learning of Multi-Timescale Forecast Information for Designing Operating Policies of Multi-Purpose Reservoirs
    Zanutto, D.
    Ficchi, A.
    Giuliani, M.
    Castelletti, A.
    WATER RESOURCES RESEARCH, 2025, 61 (02)
  • [7] DEALING WITH NON-STATIONARITY IN DECENTRALIZED COOPERATIVE MULTI-AGENT DEEP REINFORCEMENT LEARNING VIA MULTI-TIMESCALE LEARNING
    Nekoei, Hadi
    Badrinaaraayanan, Akilesh
    Sinha, Amit
    Amini, Mohammad
    Rajendran, Janarthanan
    Mahajan, Aditya
    Chandar, Sarath
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 376 - 398
  • [8] A soft actor-critic deep reinforcement learning method for multi-timescale coordinated operation of microgrids
    Hu, Chunchao
    Cai, Zexiang
    Zhang, Yanxu
    Yan, Rudai
    Cai, Yu
    Cen, Bowei
    PROTECTION AND CONTROL OF MODERN POWER SYSTEMS, 2022, 7 (01)
  • [9] Multi-Timescale Voltage Control Method Using Limited Measurable Information with Explainable Deep Reinforcement Learning
    Matsushima, Fumiya
    Aoki, Mutsumi
    Nakamura, Yuta
    Verma, Suresh Chand
    Ueda, Katsuhisa
    Imanishi, Yusuke
    ENERGIES, 2025, 18 (03)
  • [10] Multi-Timescale Collaborative Tracking
    Chen, Dapeng
    Yuan, Zejian
    Hua, Gang
    Wang, Jingdong
    Zheng, Nanning
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (01) : 141 - 155