A Scalable Parallel Q-Learning Algorithm for Resource Constrained Decentralized Computing Environments

被引:7
作者
Camelo, Miguel [1 ]
Famaey, Jeroen [1 ]
Latre, Steven [1 ]
机构
[1] Univ Antwerp, IMEC, Dept Math & Comp Sci, Middelheimlaan 1, B-2020 Antwerp, Belgium
来源
PROCEEDINGS OF 2016 2ND WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC) | 2016年
关键词
D O I
10.1109/MLHPC.2016.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Internet of Things (IoT) is more and more becoming a platform for mission critical applications with stringent requirements in terms of response time and mobility. Therefore, a centralized High Performance Computing (HPC) environment is often not suitable or simply non-existing. Instead, there is a need for a scalable HPC model that supports the deployment of applications on the decentralized but resource constrained devices of the IoT. Recently, Reinforcement Learning (RL) algorithms have been used for decision making within applications by directly interacting with the environment. However, most RL algorithms are designed for centralized environments and are time and resource consuming. Therefore, they are not applicable to such constrained decentralized computing environments. In this paper, we propose a scalable Parallel Q-Learning (PQL) algorithm for resource constrained environments. By combining a table partition strategy together with a co-allocation of both processing and storage, we can significantly reduce the individual resource cost and, at the same time, guarantee convergence and minimize the communication cost. Experimental results show that our algorithm reduces the required training in proportion of the number of Q-Learning agents and, in terms of execution time, it is up to 24 times faster than several well-known PQL algorithms.
引用
收藏
页码:27 / 35
页数:9
相关论文
共 16 条
  • [1] [Anonymous], WORKSH AG SWARM PROG
  • [2] Busoniu L, 2010, STUD COMPUT INTELL, V310, P183
  • [3] Finnman Peter, 2016, THESIS
  • [4] Reinforcement learning: A survey
    Kaelbling, LP
    Littman, ML
    Moore, AW
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 : 237 - 285
  • [5] Kretchmar RM, 2002, 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VI, PROCEEDINGS, P114
  • [6] A comparative study of parallel reinforcement learning methods with a PC cluster system
    Kushida, Masayuki
    Takahashi, Kenichi
    Ueda, Hiroaki
    Miyahara, Tetsuhiro
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2006, : 416 - +
  • [7] Mannion Patrick, 2015, P 12 EUR WORKSH REIN
  • [8] Internet of things: Vision, applications and research challenges
    Miorandi, Daniele
    Sicari, Sabrina
    De Pellegrini, Francesco
    Chlamtac, Imrich
    [J]. AD HOC NETWORKS, 2012, 10 (07) : 1497 - 1516
  • [9] Human-level control through deep reinforcement learning
    Mnih, Volodymyr
    Kavukcuoglu, Koray
    Silver, David
    Rusu, Andrei A.
    Veness, Joel
    Bellemare, Marc G.
    Graves, Alex
    Riedmiller, Martin
    Fidjeland, Andreas K.
    Ostrovski, Georg
    Petersen, Stig
    Beattie, Charles
    Sadik, Amir
    Antonoglou, Ioannis
    King, Helen
    Kumaran, Dharshan
    Wierstra, Daan
    Legg, Shane
    Hassabis, Demis
    [J]. NATURE, 2015, 518 (7540) : 529 - 533
  • [10] Printista Alicia Marcela, 2002, J COMPUTER SCI TECHN