A META-PRECONDITIONING APPROACH FOR DEEP Q-LEARNING

被引：0

作者：

Evmorfos, Spilios ^{[1
]}

Petropulu, Athina P. ^{[1
]}

机构：

[1] Rutgers State Univ, Dept Elect & Comp Engn, New Brunswick, NJ 08854 USA

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年

关键词：

deep Q-learning; reinforcement learning; meta-learning; Neural Tangent Kernels;

D O I：

10.1109/ICASSP48485.2024.10446137

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep Q-learning stands as an integral component within modern deep reinforcement learning algorithms. Notwithstanding its recent successes, deep Q-learning can be susceptible to instability and divergence, especially when combined with off-policy learning and bootstrapping, a combination also referred to as the "deadly triad". The current work introduces a novel learning process that aligns with the flow of gradient-based meta-learning algorithms and is designed to be performed prior to the application of deep Q-learning. The primary goal of the proposed learning process is to instill favorable generalization properties within the Q-function approximator, by conditioning its corresponding Neural Tangent Kernel. The proposed approach is applied on a sample of the environments of the DeepMind Control Suite and provides about 15% improvement in average reward accumulation.

引用

页码：6485 / 6489

页数：5

共 19 条

[1]

Achiam J., 2019, ARXIV

[2]

Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)

[3]

Finn C, 2017, PR MACH LEARN RES, V70

[4]

Fujimoto S, 2018, PR MACH LEARN RES, V80

[5]

Haarnoja Tuomas., 2018, International conference on machine learning, P1861, DOI DOI 10.48550/ARXIV.1801.01290

[6]

Hessel M, 2018, AAAI CONF ARTIF INTE, P3215

[7]

Jacot A, 2018, ADV NEUR IN, V31

[8]

Kalashnikov D., 2018, arXiv

[9]

Kingsbury D, 2015, P1, DOI [DOI 10.1021/bk-2015-1214.ch001, DOI 10.48550/ARXIV.1412.6980]

[10]

Kumar A., 2020, ARXIV

← 1 2 →