Deep reinforcement learning for static noisy state feedback control with reward estimation

被引：0

作者：

Wang, Ran ^{[1
]}

Kashima, Kenji ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Kyoto, Japan

来源：

ADVANCED ROBOTICS | 2025年 / 39卷 / 05期

关键词：

Deep reinforcement learning; model-free; static noisy state feedback; reward estimation; partially observable MDP;

D O I：

10.1080/01691864.2025.2468215

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) has demonstrated extraordinary capabilities in learning optimal policies for Markov decision processes (MDPs). However, when measurement noise affects the observation of the state, the problem transforms into a partially observable MDP (POMDP), which becomes nearly intractable when the system dynamics are unknown. To this end, we establish a reward estimation-based DRL algorithm to evaluate the long-term reward and learn a static (memoryless) noisy state feedback (SNSF) policy under additional assumptions. Numerical simulations validate the algorithm's effectiveness, and the related code is open-sourced at https://github.com/RanKyoto/RE-POMDP.

引用

页码：259 / 272

页数：14

共 24 条

[1]

Albrecht S.V., 2024, Multi-Agent Reinforcement Learning: Foundations and Modern Approaches

[2]

[Anonymous], 2016, Openai gym

[3]

Azizzadenesheli K., 2016, PMLR, P1639

[4]

Chen Y.C., 2017, Biostatistics & Epidemiology, V1, P161, DOI [10.1080/24709360.2017.1396742, DOI 10.1080/24709360.2017.1396742]

[5]

Egorov M, 2017, J MACH LEARN RES, V18

[6]

Haarnoja T, 2018, PR MACH LEARN RES, V80

[7] On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables [J].

Isserlis, L .

BIOMETRIKA, 1918, 12 :134-139

[8] Finding optimal memoryless policies of POMDPs under the expected average reward criterion [J].

Li, Yanjie ;

Yin, Baoqun ;

Xi, Hongsheng .

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2011, 211 (03) :556-567

[9]

Lillicrap T.P., 2019, CONTINUOUS CONTROL D

[10]

Littman M. L., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P362

← 1 2 3 →