Multi-timescale nexting in a reinforcement learning robot

被引：37

作者：

Modayil, Joseph ^{[1
]}

White, Adam ^{[1
]}

Sutton, Richard S. ^{[1
]}

机构：

[1] Univ Alberta, Reinforcement Learning & Artificial Intelligence, Edmonton, AB, Canada

来源：

ADAPTIVE BEHAVIOR | 2014年 / 22卷 / 02期

关键词：

Reinforcement learning; robotics; predictive knowledge; temporal difference learning; FUTURE;

D O I：

10.1177/1059712313511648

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The term nexting' has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense. The ability to next' constitutes a basic kind of awareness and knowledge of one's environment. In this paper we present results with a robot that learns to next in real time, making thousands of predictions about sensory input signals at timescales from 0.1 to 8 seconds. Our predictions are formulated as a generalization of the value functions commonly used in reinforcement learning, where now an arbitrary function of the sensory input signals is used as a pseudo reward, and the discount rate determines the timescale. We show that six thousand predictions, each computed as a function of six thousand features of the state, can be learned and updated online ten times per second on a laptop computer, using the standard temporal-difference() algorithm with linear function approximation. This approach is sufficiently computationally efficient to be used for real-time learning on the robot and sufficiently data efficient to achieve substantial accuracy within 30 minutes. Moreover, a single tile-coded feature representation suffices to accurately predict many different signals over a significant range of timescales. We also extend nexting beyond simple timescales by letting the discount rate be a function of the state and show that nexting predictions of this more general form can also be learned with substantial accuracy. General nexting provides a simple yet powerful mechanism for a robot to acquire predictive knowledge of the dynamics of its environment.

引用

页码：146 / 160

页数：15

共 50 条

[1] Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning
Emami, Patrick
Zhang, Xiangyu
Biagioni, David
Zamzam, Ahmed S.
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 2372 - 2378
[2] Multi-timescale Deep Reinforcement Learning for Reactive Power Optimization of Distribution Network
Hu D.
Peng Y.
Wei W.
Xiao T.
Cai T.
Xi W.
Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2022, 42 (14): : 5034 - 5044
[3] Dynamic Lane Traffic Signal Control with Group Attention and Multi-Timescale Reinforcement Learning
Jiang, Qize
Li, Jingze
Sun, Weiwei
Zheng, Baihua
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3642 - 3648
[4] EdgeTimer: Adaptive Multi-Timescale Scheduling in Mobile Edge Computing with Deep Reinforcement Learning
Hao, Yijun
Yang, Shusen
Li, Fang
Zhang, Yifan
Wang, Shibo
Ren, Xuebin
IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2024, : 671 - 680
[5] Multi-timescale voltage control for distribution system based on multi-agent deep reinforcement learning
Wu, Zhi
Li, Yiqi
Gu, Wei
Dong, Zengbo
Zhao, Jingtao
Liu, Weiliang
Zhang, Xiao-Ping
Liu, Pengxiang
Sun, Qirun
INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2023, 147
[6] Reinforcement Learning of Multi-Timescale Forecast Information for Designing Operating Policies of Multi-Purpose Reservoirs
Zanutto, D.
Ficchi, A.
Giuliani, M.
Castelletti, A.
WATER RESOURCES RESEARCH, 2025, 61 (02)
[7] DEALING WITH NON-STATIONARITY IN DECENTRALIZED COOPERATIVE MULTI-AGENT DEEP REINFORCEMENT LEARNING VIA MULTI-TIMESCALE LEARNING
Nekoei, Hadi
Badrinaaraayanan, Akilesh
Sinha, Amit
Amini, Mohammad
Rajendran, Janarthanan
Mahajan, Aditya
Chandar, Sarath
CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 376 - 398
[8] A soft actor-critic deep reinforcement learning method for multi-timescale coordinated operation of microgrids
Hu, Chunchao
Cai, Zexiang
Zhang, Yanxu
Yan, Rudai
Cai, Yu
Cen, Bowei
PROTECTION AND CONTROL OF MODERN POWER SYSTEMS, 2022, 7 (01)
[9] Multi-Timescale Voltage Control Method Using Limited Measurable Information with Explainable Deep Reinforcement Learning
Matsushima, Fumiya
Aoki, Mutsumi
Nakamura, Yuta
Verma, Suresh Chand
Ueda, Katsuhisa
Imanishi, Yusuke
ENERGIES, 2025, 18 (03)
[10] Multi-Timescale Collaborative Tracking
Chen, Dapeng
Yuan, Zejian
Hua, Gang
Wang, Jingdong
Zheng, Nanning
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (01) : 141 - 155

← 1 2 3 4 5 →