Knowledge-based Exploration for Reinforcement Learning in Self-Organizing Neural Networks

被引:10
作者
Teng, Teck-Hou [1 ]
Tan, Ah-Hwee [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
来源
2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 2 | 2012年
关键词
Reinforcement Learning; Self-Organizing Neural Network; Directed Exploration; Rule-Based System; ARCHITECTURE; PURSUIT; EVASION;
D O I
10.1109/WI-IAT.2012.154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Exploration is necessary during reinforcement learning to discover new solutions in a given problem space. Most reinforcement learning systems, however, adopt a simple strategy, by randomly selecting an action among all the available actions. This paper proposes a novel exploration strategy, known as Knowledge-based Exploration, for guiding the exploration of a family of self-organizing neural networks in reinforcement learning. Specifically, exploration is directed towards unexplored and favorable action choices while steering away from those negative action choices that are likely to fail. This is achieved by using the learned knowledge of the agent to identify prior action choices leading to low Q-values in similar situations. Consequently, the agent is expected to learn the right solutions in a shorter time, improving overall learning efficiency. Using a Pursuit-Evasion problem domain, we evaluate the efficacy of the knowledge-based exploration strategy, in terms of task performance, rate of learning and model complexity. Comparison with random exploration and three other heuristic-based directed exploration strategies show that Knowledge-based Exploration is significantly more effective and robust for reinforcement learning in real time.
引用
收藏
页码:332 / 339
页数:8
相关论文
共 23 条
  • [1] Berkovitz L. D., 1971, ADV GAME THEORY
  • [2] A MASSIVELY PARALLEL ARCHITECTURE FOR A SELF-ORGANIZING NEURAL PATTERN-RECOGNITION MACHINE
    CARPENTER, GA
    GROSSBERG, S
    [J]. COMPUTER VISION GRAPHICS AND IMAGE PROCESSING, 1987, 37 (01): : 54 - 115
  • [3] VISION-BASED PURSUIT-EVASION IN A GRID
    Dumitrescu, Adrian
    Kok, Howi
    Suzuki, Ichiro
    Zylinski, Pawel
    [J]. SIAM JOURNAL ON DISCRETE MATHEMATICS, 2010, 24 (03) : 1177 - 1204
  • [4] Endsley MicaR., 2001, P 2 INT WORKSHOP SYM, P1
  • [5] Fernandez F., 2006, P 5 INT JOINT C AUT, P720, DOI DOI 10.1145/1160633.1160762
  • [6] Ficici SG, 1999, LECT NOTES ARTIF INT, V1674, P79
  • [7] Guiding exploration by pre-existing knowledge without modifying reward
    Framling, Kary
    [J]. NEURAL NETWORKS, 2007, 20 (06) : 736 - 747
  • [8] Harmon M. E., 1995, Advances in Neural Information Processing Systems 7, P353
  • [9] Parviainen J., 2006, International Journal of Management Concepts and Philosophy, V2, P140
  • [10] Perez-Uribe A., 2002, THESIS