Reinforcement learning with augmented states in partially expectation and action observable environment

被引:0
作者
Guirnaldo, SA [1 ]
Watanabe, K [1 ]
Izumi, K [1 ]
Kiguchi, K [1 ]
机构
[1] Saga Univ, Fac Engn Syst & Technol, Grad Sch Sci & Engn, Saga 8408502, Japan
来源
SICE 2002: PROCEEDINGS OF THE 41ST SICE ANNUAL CONFERENCE, VOLS 1-5 | 2002年
关键词
partially observable Markov decision processes; expectation; reinforcement learning; perception; perceptual aliasing;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of developing good or optimal policies for partially observable Markov decision processes (POMDP) remains one of the most alluring areas of research in artificial intelligence. Encourage by the way how we (humans) form expectations from past experiences and how our decisions and behaviour are affected with our expectations, this paper proposes a method called expectation and action augmented states (EAAS) in reinforcement learning aimed to discover good or near optimal policies in partially observable environment. The method uses the concept of expectation to give distinction between aliased states. It works by augmenting the agent's observation with its expectation of that observation. Two problems from the literature were used to test the proposed method. The results show promising characteristics of the method as compared to some methods currently being used in this domain.
引用
收藏
页码:823 / 828
页数:6
相关论文
共 50 条
[31]   CHQ: A multi-agent reinforcement learning scheme for partially observable Markov decision processes [J].
Osada, H ;
Fujita, S .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (05) :1004-1011
[32]   Determining maintenance policies for partially observable multi-component systems with deep reinforcement learning [J].
Karabag, Oktay .
PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, 2025, 31 (02) :166-179
[33]   Reinforcement learning with internal expectation for the random neural network [J].
Halici, U .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2000, 126 (02) :288-307
[34]   Model-free Control of Partially Observable Underactuated Systems by pairing Reinforcement Learning with Delay Embeddings [J].
Knudsen, Martinius ;
Hendseth, Sverre ;
Tufte, Gunnar ;
Sandvig, Axel .
MODELING IDENTIFICATION AND CONTROL, 2022, 43 (01) :1-8
[35]   Memory-driven deep-reinforcement learning for autonomous robot navigation in partially observable environments [J].
Montero, Estrella ;
Pico, Nabih ;
Ghergherehchi, Mitra ;
Song, Ho Seung .
ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2025, 62
[36]   A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes [J].
Ross, Stephane ;
Pineau, Joelle ;
Chaib-draa, Brahim ;
Kreitmann, Pierre .
JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 :1729-1770
[37]   Policy Reuse for Learning and Planning in Partially Observable Markov Decision Processes [J].
Wu, Bo ;
Feng, Yanpeng .
2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, :549-552
[38]   Guided Soft Actor Critic: A Guided Deep Reinforcement Learning Approach for Partially Observable Markov Decision Processes [J].
Haklidir, Mehmet ;
Temeltas, Hakan .
IEEE ACCESS, 2021, 9 :159672-159683
[39]   Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments [J].
Li, Junchao ;
Cai, Mingyu ;
Kan, Zhen ;
Xiao, Shaoping .
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2024, 38 (01)
[40]   Learning to Act Optimally in Partially Observable Multiagent Settings [J].
Ceren, Roi .
AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, :1532-1533