Unsupervised Modeling of Partially Observable Environments

被引:0
|
作者
Graziano, Vincent [1 ]
Koutnik, Jan [1 ]
Schmidhuber, Juergen [1 ]
机构
[1] Univ Lugano, SUPSI, IDSIA, CH-6928 Manno, Switzerland
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I | 2011年 / 6911卷
关键词
Self-Organizing Maps; POMDPs; Reinforcement Learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an architecture based on self-organizing maps for learning a sensory layer in a learning system. The architecture, temporal network for transitions (TNT), enjoys the freedoms of unsupervised learning, works on-line, in non-episodic environments, is computationally light, and scales well. TNT generates a predictive model of its internal representation of the world, making planning methods available for both the exploitation and exploration of the environment. Experiments demonstrate that TNT learns nice representations of classical reinforcement learning mazes of varying size (up to 20 x 20) under conditions of high-noise and stochastic actions.
引用
收藏
页码:503 / 515
页数:13
相关论文
共 50 条
  • [41] Fuzzy Reinforcement Learning Control for Decentralized Partially Observable Markov Decision Processes
    Sharma, Rajneesh
    Spaan, Matthijs T. J.
    IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1422 - 1429
  • [42] Deep Reinforcement Learning for Partially Observable Data Poisoning Attack in Crowdsensing Systems
    Li, Mohan
    Sun, Yanbin
    Lu, Hui
    Maharjan, Sabita
    Tian, Zhihong
    IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (07): : 6266 - 6278
  • [43] BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM
    Ni, Yaodong
    Liu, Zhi-Qiang
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2013, 21 (06) : 821 - 863
  • [44] Partially observable Markov decision process to generate policies in software defect management
    Akbarinasaji, Shirin
    Kavaklioglu, Can
    Basar, Ayse
    Neal, Adam
    JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 163
  • [45] Learning-based line impedance estimation for partially observable distribution systems
    Zhu, Yanming
    Xu, Xiaoyuan
    Yan, Zheng
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2022, 137
  • [46] A nucleus for Bayesian Partially Observable Markov Games: Joint observer and mechanism design
    Clempner, Julio B.
    Poznyak, Alexander S.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 95
  • [47] Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
    Shang, Wenjie
    Li, Qingyang
    Qin, Zhiwei
    Yu, Yang
    Meng, Yiping
    Ye, Jieping
    MACHINE LEARNING, 2021, 110 (09) : 2603 - 2640
  • [48] Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
    Wenjie Shang
    Qingyang Li
    Zhiwei Qin
    Yang Yu
    Yiping Meng
    Jieping Ye
    Machine Learning, 2021, 110 : 2603 - 2640
  • [49] A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game
    Shin Ishii
    Hajime Fujita
    Masaoki Mitsutake
    Tatsuya Yamazaki
    Jun Matsuda
    Yoichiro Matsuno
    Machine Learning, 2005, 59 : 31 - 54
  • [50] A reinforcement learning scheme for a partially-observable multi-agent game
    Ishii, S
    Fujita, H
    Mitsutake, M
    Yamazaki, T
    Matsuda, J
    Matsuno, Y
    MACHINE LEARNING, 2005, 59 (1-2) : 31 - 54