Reinforcement learning with augmented states in partially expectation and action observable environment

被引：0

作者：

Guirnaldo, SA ^{[1
]}

Watanabe, K ^{[1
]}

Izumi, K ^{[1
]}

Kiguchi, K ^{[1
]}

机构：

[1] Saga Univ, Fac Engn Syst & Technol, Grad Sch Sci & Engn, Saga 8408502, Japan

来源：

SICE 2002: PROCEEDINGS OF THE 41ST SICE ANNUAL CONFERENCE, VOLS 1-5 | 2002年

关键词：

partially observable Markov decision processes; expectation; reinforcement learning; perception; perceptual aliasing;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The problem of developing good or optimal policies for partially observable Markov decision processes (POMDP) remains one of the most alluring areas of research in artificial intelligence. Encourage by the way how we (humans) form expectations from past experiences and how our decisions and behaviour are affected with our expectations, this paper proposes a method called expectation and action augmented states (EAAS) in reinforcement learning aimed to discover good or near optimal policies in partially observable environment. The method uses the concept of expectation to give distinction between aliased states. It works by augmenting the agent's observation with its expectation of that observation. Two problems from the literature were used to test the proposed method. The results show promising characteristics of the method as compared to some methods currently being used in this domain.

引用

页码：823 / 828

页数：6

共 50 条

[11] Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning [J].

Doshi-Velez, Finale ;

Pfau, David ;

Wood, Frank ;

Roy, Nicholas .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (02) :394-407

[12] PALO bounds for reinforcement learning in partially observable stochastic games [J].

Ceren, Roi ;

He, Keyang ;

Doshi, Prashant ;

Banerjee, Bikramjit .

NEUROCOMPUTING, 2021, 420 :36-56

[13] Disturbance Observable Reinforcement Learning that Compensates for Changes in Environment [J].

Kim, SeongIn ;

Shibuya, Takeshi .

2022 61ST ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS (SICE), 2022, :141-145

[14] Modeling and reinforcement learning in partially observable many-agent systems [J].

He, Keyang ;

Doshi, Prashant ;

Banerjee, Bikramjit .

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2024, 38 (01)

[15] Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation [J].

Wu, Yaxiong ;

Macdonald, Craig ;

Ounis, Iadh .

15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, :241-251

[16] A novel approach for self-driving car in partially observable environment using life long reinforcement learning [J].

Quadir, Md Abdul ;

Jaiswal, Dibyanshu ;

Mohan, Senthilkumar ;

Innab, Nisreen ;

Sulaiman, Riza ;

Alaoui, Mohammed Kbiri ;

Ahmadian, Ali .

SUSTAINABLE ENERGY GRIDS & NETWORKS, 2024, 38

[17] Fuzzy Reinforcement Learning Control for Decentralized Partially Observable Markov Decision Processes [J].

Sharma, Rajneesh ;

Spaan, Matthijs T. J. .

IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, :1422-1429

[18] Deep Reinforcement Learning for Partially Observable Data Poisoning Attack in Crowdsensing Systems [J].

Li, Mohan ;

Sun, Yanbin ;

Lu, Hui ;

Maharjan, Sabita ;

Tian, Zhihong .

IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (07) :6266-6278

[19] A reinforcement learning scheme for a partially-observable multi-agent game [J].

Ishii, S ;

Fujita, H ;

Mitsutake, M ;

Yamazaki, T ;

Matsuda, J ;

Matsuno, Y .

MACHINE LEARNING, 2005, 59 (1-2) :31-54

[20] A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game [J].

Shin Ishii ;

Hajime Fujita ;

Masaoki Mitsutake ;

Tatsuya Yamazaki ;

Jun Matsuda ;

Yoichiro Matsuno .

Machine Learning, 2005, 59 :31-54

← 1 2 3 4 5 →