Identification and off-policy learning of multiple objectives using adaptive clustering

被引:12
作者
Karimpanal, Thommen George [1 ]
Wilhelm, Erik [1 ]
机构
[1] Singapore Univ Technol & Design, Engn Prod Dev, 8 Somapah Rd, Singapore 487372, Singapore
关键词
Reinforcement learning; Q-learning; Off-policy; Adaptive clustering; Multiobjective learning; REINFORCEMENT; FRAMEWORK;
D O I
10.1016/j.neucom.2017.04.074
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present a methodology that enables an agent to make efficient use of its exploratory actions by autonomously identifying possible objectives in its environment and learning them in parallel. The identification of objectives is achieved using an online and unsupervised adaptive clustering algorithm. The identified objectives are learned (at least partially) in parallel using Q-learning. Using a simulated agent and environment, it is shown that the converged or partially converged value function weights resulting from off-policy learning can be used to accumulate knowledge about multiple objectives without any additional exploration. We claim that the proposed approach could be useful in scenarios where the objectives are initially unknown or in real world scenarios where exploration is typically a time and energy intensive process. The implications and possible extensions of this work are also briefly discussed. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:39 / 47
页数:9
相关论文
共 30 条
  • [1] Anderberg M. R., 2014, CLUSTER ANAL APPL PO, V19
  • [2] [Anonymous], 2011, P 10 INT C AUT AG MU
  • [3] [Anonymous], ICDL
  • [4] [Anonymous], 2003, J. Mach. Learn. Res.
  • [5] [Anonymous], 1998, REINFORCEMENT LEARNI
  • [6] [Anonymous], 2000, INT C MACHINE LEARNI
  • [7] Bhatia S. K., 2004, FLAIRS C, P695
  • [8] A comprehensive survey of multiagent reinforcement learning
    Busoniu, Lucian
    Babuska, Robert
    De Schutter, Bart
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02): : 156 - 172
  • [9] Carpenter G. A., 2016, ENCY MACHINE LEARNIN, P1
  • [10] Cornish Christopher John, 1989, (Ph.D. thesis