Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits

被引：0

作者：

H. S. Al-Dayaa

D. B. Megherbi

机构：

[1] University of Massachusetts,

来源：

The Journal of Supercomputing | 2012年 / 62卷

关键词：

Reinforcement learning; Q-Learning; Machine learning; Integrated circuits; Lyapunov stability; Robotics;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Reinforcement learning (RL) techniques have contributed and continue to tremendously contribute to the advancement of machine learning and its many related recent applications. As it is well known, some of the main limitations of existing RL techniques are, in general, their slow convergence and their computational complexity. The contributions of this paper are two-fold: (1) First, it introduces a technique for reinforcement learning using multiple lookahead levels that grants an autonomous agent more visibility in its environment and helps it learn faster. This technique extends the Watkins’s Q-Learning algorithm by using the Multiple-Lookahead-Levels (MLL) model equation that we develop and present here. An analysis of the convergence of the MLL equation and proof of its effectiveness are performed. A method to compute the improvement rate of the agent’s learning speed between different look-ahead levels is also proposed and implemented. Here, both the time and space complexities are examined. Results show that the number of steps, required to achieve the goal, per learning path exponentially decreases with the learning path number (time). Results also show that the number of steps per learning path, to some degree, is less at any time when the number of look-ahead levels is higher (space). Furthermore, we perform the analysis of the MLL system in the time domain and prove its temporal stability using Lyapunov theory. (2) Second, based on this Lyapunov stability analysis, we subsequently, and for the first time, propose a circuit architecture for the MLL technique’s software configurable hardware system design for real-time applications.

引用

页码：588 / 615

页数：27

共 27 条

[1] Araabi BN(2007)A study on expertise of agents and its effects on cooperative Q-Learning IEEE Trans Syst Man Cybern, Part B, Cybern 37 398-409
[2] Mastoureshgh S(1992)Q-Learning Mach Learn 8 279-292
[3] Ahmadabadi MN(2000)Quad-Q-Learning IEEE Trans Neural Netw 11 279-294
[4] Watkins C(2008)Improved adaptive–reinforcement learning control for morphing unmanned air vehicles IEEE Trans Syst Man Cybern, Part B, Cybern 38 1014-1020
[5] Dayan P(2008)Estimating biped gait using spline-based probability distribution function with Q-Learning IEEE Trans Ind Electron 55 1444-1452
[6] Clausen C(2004)A new Q-Learning algorithm based on the metropolis criterion IEEE Trans Syst Man Cybern, Part B, Cybern 34 2140-2143
[7] Wechsler H(2008)Ensemble algorithms in reinforcement learning IEEE Trans Syst Man Cybern, Part B, Cybern 38 930-935
[8] Valasek J(1992)The convergence of TD( Mach Learn 8 341-362
[9] Doebbler J(1996)) for general Mach Learn 22 251-281
[10] Tandale MD(2005)Creating advice-taking reinforcement learners IEEE Trans Intell Transp Syst 6 285-293

← 1 2 3 →