Quantum reinforcement learning via policy iteration

被引:0
作者
El Amine Cherrat
Iordanis Kerenidis
Anupam Prakash
机构
[1] Université de Paris,
[2] CNRS,undefined
[3] IRIF,undefined
来源
Quantum Machine Intelligence | 2023年 / 5卷
关键词
Quantum computing; Quantum machine learning; Reinforcement learning; Policy iteration;
D O I
暂无
中图分类号
学科分类号
摘要
Quantum computing has shown the potential to substantially speed up machine learning applications, in particular for supervised and unsupervised learning. Reinforcement learning, on the other hand, has become essential for solving many decision-making problems and policy iteration methods remain the foundation of such approaches. In this paper, we provide a general framework for performing quantum reinforcement learning via policy iteration. We validate our framework by designing and analyzing quantum policy evaluation methods for infinite-horizon discounted problems by building quantum states that approximately encode the value function of a policy, and quantum policy improvement methods by post-processing measurement outcomes on these quantum states. Last, we study the theoretical and experimental performance of our quantum algorithms on two environments from OpenAI’s Gym.
引用
收藏
相关论文
共 50 条
  • [11] Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
    Johannes Fürnkranz
    Eyke Hüllermeier
    Weiwei Cheng
    Sang-Hyeun Park
    [J]. Machine Learning, 2012, 89 : 123 - 156
  • [12] Robust Control of An Inverted Pendulum System Based on Policy Iteration in Reinforcement Learning
    Ma, Yan
    Xu, Dengguo
    Huang, Jiashun
    Li, Yahui
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (24):
  • [13] Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration
    Cheng, Yuhu
    Chen, Lin
    Chen, C. L. Philip
    Wang, Xuesong
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (04) : 1023 - 1032
  • [14] Dual policy iteration-reinforcement learning to optimize the detection quality of passive remote sensing device
    Guo, Rui
    Fu, Zhonghao
    [J]. SIGNAL PROCESSING, 2023, 209
  • [15] Q-LEARNING, POLICY ITERATION AND ACTOR-CRITIC REINFORCEMENT LEARNING COMBINED WITH METAHEURISTIC ALGORITHMS IN SERVO SYSTEM CONTROL
    Zamfirache, Iuliu Alexandru
    Precup, Radu-Emil
    Petriu, Emil M.
    [J]. FACTA UNIVERSITATIS-SERIES MECHANICAL ENGINEERING, 2023, 21 (04) : 615 - 630
  • [16] A reinforcement learning algorithm based on policy iteration for average reward: Empirical results with yield management and convergence analysis
    Gosavi, A
    [J]. MACHINE LEARNING, 2004, 55 (01) : 5 - 29
  • [17] Robotic Depalletizing via Reinforcement Learning of a Pushing Policy
    Dimou, Argyris
    Kiatos, Marios
    Malassiotis, Sotiris
    [J]. SUPPLY CHAINS, PT I, ICSC 2024, 2025, 2110 : 105 - 117
  • [18] A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis
    Abhijit Gosavi
    [J]. Machine Learning, 2004, 55 : 5 - 29
  • [19] Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning
    Zhang Y.
    Qu G.
    Xu P.
    Lin Y.
    Chen Z.
    Wierman A.
    [J]. Performance Evaluation Review, 2023, 51 (01): : 83 - 84
  • [20] Efficient relation extraction via quantum reinforcement learning
    Zhu, Xianchao
    Mu, Yashuang
    Wang, Xuetao
    Zhu, William
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 4009 - 4018