A META-PRECONDITIONING APPROACH FOR DEEP Q-LEARNING

被引:0
作者
Evmorfos, Spilios [1 ]
Petropulu, Athina P. [1 ]
机构
[1] Rutgers State Univ, Dept Elect & Comp Engn, New Brunswick, NJ 08854 USA
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
关键词
deep Q-learning; reinforcement learning; meta-learning; Neural Tangent Kernels;
D O I
10.1109/ICASSP48485.2024.10446137
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep Q-learning stands as an integral component within modern deep reinforcement learning algorithms. Notwithstanding its recent successes, deep Q-learning can be susceptible to instability and divergence, especially when combined with off-policy learning and bootstrapping, a combination also referred to as the "deadly triad". The current work introduces a novel learning process that aligns with the flow of gradient-based meta-learning algorithms and is designed to be performed prior to the application of deep Q-learning. The primary goal of the proposed learning process is to instill favorable generalization properties within the Q-function approximator, by conditioning its corresponding Neural Tangent Kernel. The proposed approach is applied on a sample of the environments of the DeepMind Control Suite and provides about 15% improvement in average reward accumulation.
引用
收藏
页码:6485 / 6489
页数:5
相关论文
共 19 条
[1]  
Achiam J., 2019, ARXIV
[2]  
Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)
[3]  
Finn C, 2017, PR MACH LEARN RES, V70
[4]  
Fujimoto S, 2018, PR MACH LEARN RES, V80
[5]  
Haarnoja Tuomas., 2018, International conference on machine learning, P1861, DOI DOI 10.48550/ARXIV.1801.01290
[6]  
Hessel M, 2018, AAAI CONF ARTIF INTE, P3215
[7]  
Jacot A, 2018, ADV NEUR IN, V31
[8]  
Kalashnikov D., 2018, arXiv
[9]  
Kingsbury D, 2015, P1, DOI [DOI 10.1021/bk-2015-1214.ch001, DOI 10.48550/ARXIV.1412.6980]
[10]  
Kumar A., 2020, ARXIV