Deep reinforcement learning for static noisy state feedback control with reward estimation

被引:0
作者
Wang, Ran [1 ]
Kashima, Kenji [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
关键词
Deep reinforcement learning; model-free; static noisy state feedback; reward estimation; partially observable MDP;
D O I
10.1080/01691864.2025.2468215
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Deep reinforcement learning (DRL) has demonstrated extraordinary capabilities in learning optimal policies for Markov decision processes (MDPs). However, when measurement noise affects the observation of the state, the problem transforms into a partially observable MDP (POMDP), which becomes nearly intractable when the system dynamics are unknown. To this end, we establish a reward estimation-based DRL algorithm to evaluate the long-term reward and learn a static (memoryless) noisy state feedback (SNSF) policy under additional assumptions. Numerical simulations validate the algorithm's effectiveness, and the related code is open-sourced at https://github.com/RanKyoto/RE-POMDP.
引用
收藏
页码:259 / 272
页数:14
相关论文
共 24 条
[1]  
Albrecht S.V., 2024, Multi-Agent Reinforcement Learning: Foundations and Modern Approaches
[2]  
[Anonymous], 2016, Openai gym
[3]  
Azizzadenesheli K., 2016, PMLR, P1639
[4]  
Chen Y.C., 2017, Biostatistics & Epidemiology, V1, P161, DOI [10.1080/24709360.2017.1396742, DOI 10.1080/24709360.2017.1396742]
[5]  
Egorov M, 2017, J MACH LEARN RES, V18
[6]  
Haarnoja T, 2018, PR MACH LEARN RES, V80
[8]   Finding optimal memoryless policies of POMDPs under the expected average reward criterion [J].
Li, Yanjie ;
Yin, Baoqun ;
Xi, Hongsheng .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2011, 211 (03) :556-567
[9]  
Lillicrap T.P., 2019, CONTINUOUS CONTROL D
[10]  
Littman M. L., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P362