Fidelity-Based Probabilistic Q-Learning for Control of Quantum Systems

被引:117
作者
Chen, Chunlin [1 ,2 ]
Dong, Daoyi [3 ,4 ]
Li, Han-Xiong [5 ]
Chu, Jian [6 ]
Tarn, Tzyh-Jong [7 ]
机构
[1] Nanjing Univ, Dept Control & Syst Engn, Nanjing 210093, Jiangsu, Peoples R China
[2] Princeton Univ, Dept Chem, Princeton, NJ 08544 USA
[3] Univ New S Wales, Sch Engn & Informat Technol, Australian Def Force Acad, Canberra, ACT 2600, Australia
[4] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou 310027, Zhejiang, Peoples R China
[5] City Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
[6] Zhejiang Univ, Inst Cyber Syst & Control, State Key Lab Ind Control Technol, Hangzhou 310027, Zhejiang, Peoples R China
[7] Washington Univ, Dept Elect & Syst Engn, St Louis, MO 63130 USA
基金
澳大利亚研究理事会;
关键词
Fidelity; probabilistic Q-learning; quantum control; reinforcement learning; POLICY ITERATION; ALGORITHM; STRATEGY; DESIGN;
D O I
10.1109/TNNLS.2013.2283574
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The balance between exploration and exploitation is a key problem for reinforcement learning methods, especially for Q-learning. In this paper, a fidelity-based probabilistic Q-learning (FPQL) approach is presented to naturally solve this problem and applied for learning control of quantum systems. In this approach, fidelity is adopted to help direct the learning process and the probability of each action to be selected at a certain state is updated iteratively along with the learning process, which leads to a natural exploration strategy instead of a pointed one with configured parameters. A probabilistic Q-learning (PQL) algorithm is first presented to demonstrate the basic idea of probabilistic action selection. Then the FPQL algorithm is presented for learning control of quantum systems. Two examples (a spin-1/2 system and a Lambda-type atomic system) are demonstrated to test the performance of the FPQL algorithm. The results show that FPQL algorithms attain a better balance between exploration and exploitation, and can also avoid local optimal policies and accelerate the learning process.
引用
收藏
页码:920 / 933
页数:14
相关论文
共 55 条
[1]   Quantum speed-up for unsupervised learning [J].
Aimeur, Esma ;
Brassard, Gilles ;
Gambs, Sebastien .
MACHINE LEARNING, 2013, 90 (02) :261-287
[2]   Modeling and Control of Quantum Systems: An Introduction [J].
Altafini, Claudio ;
Ticozzi, Francesco .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2012, 57 (08) :1898-1917
[3]   Backpropagation Training in Adaptive Quantum Networks [J].
Altman, Christopher ;
Zapatrin, Roman R. .
INTERNATIONAL JOURNAL OF THEORETICAL PHYSICS, 2010, 49 (12) :2991-2997
[4]   Robust reinforcement learning control using integral quadratic constraints for recurrent neural networks [J].
Anderson, Charles W. ;
Young, Peter Michael ;
Buehner, Michael R. ;
Knight, James N. ;
Bush, Keith A. ;
Hittle, Douglas C. .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (04) :993-1002
[5]  
[Anonymous], 1996, Neuro-dynamic programming
[6]  
[Anonymous], 2001, P INT JOINT C ART IN
[7]  
[Anonymous], 2005, P INT C MACH LEARN, DOI [10.1145/1102351.1102352, 10.1145/1102351, DOI 10.1145/1102351]
[8]  
Bason MG, 2012, NAT PHYS, V8, P147, DOI [10.1038/NPHYS2170, 10.1038/nphys2170]
[9]   A comprehensive survey of multiagent reinforcement learning [J].
Busoniu, Lucian ;
Babuska, Robert ;
De Schutter, Bart .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172
[10]   Quantum computation for action selection using reinforcement learning [J].
C. L. Chen ;
D. Y. Dong ;
Z. H. Chen .
INTERNATIONAL JOURNAL OF QUANTUM INFORMATION, 2006, 4 (06) :1071-1083