Toward robust and scalable deep spiking reinforcement learning

被引：11

作者：

Akl, Mahmoud ^{[1
]}

Ergene, Deniz ^{[1
]}

Walter, Florian ^{[1
]}

Knoll, Alois ^{[1
]}

机构：

[1] Tech Univ Munich, Chair Robot, TUM Sch Computat Informat & Technol, Artificial Intelligence & Embedded Syst, Munich, Germany

来源：

FRONTIERS IN NEUROROBOTICS | 2023年 / 16卷

关键词：

spiking neural network (SNN); reinforcement learning; deep reinforcement learning (Deep RL); continuous control; hyperparameter tuning; NETWORKS;

D O I：

10.3389/fnbot.2022.1075647

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) combines reinforcement learning algorithms with deep neural networks (DNNs). Spiking neural networks (SNNs) have been shown to be a biologically plausible and energy efficient alternative to DNNs. Since the introduction of surrogate gradient approaches that allowed to overcome the discontinuity in the spike function, SNNs can now be trained with the backpropagation through time (BPTT) algorithm. While largely explored on supervised learning problems, little work has been done on investigating the use of SNNs as function approximators in DRL. Here we show how SNNs can be applied to different DRL algorithms like Deep Q-Network (DQN) and Twin-Delayed Deep Deteministic Policy Gradient (TD3) for discrete and continuous action space environments, respectively. We found that SNNs are sensitive to the additional hyperparameters introduced by spiking neuron models like current and voltage decay factors, firing thresholds, and that extensive hyperparameter tuning is inevitable. However, we show that increasing the simulation time of SNNs, as well as applying a two-neuron encoding to the input observations helps reduce the sensitivity to the membrane parameters. Furthermore, we show that randomizing the membrane parameters, instead of selecting uniform values for all neurons, has stabilizing effects on the training. We conclude that SNNs can be utilized for learning complex continuous control problems with state-of-the-art DRL algorithms. While the training complexity increases, the resulting SNNs can be directly executed on neuromorphic processors and potentially benefit from their high energy efficiency.

引用

页数：11

共 54 条

[1]

Agarwal R, 2020, PR MACH LEARN RES, V119

[2]

Akl M., 2022, P INT C NEUROMORPHIC

[3]

Akl M., 2021, INT C NEUROMORPHIC S

[4] Mapping from Frame-Driven to Frame-Free Event-Driven Vision Systems by Low-Rate Rate Coding and Coincidence Processing-Application to Feedforward ConvNets [J].

Antonio Perez-Carrasco, Jose ;

Zhao, Bo ;

Serrano, Carmen ;

Acha, Begona ;

Serrano-Gotarredona, Teresa ;

Chen, Shouchun ;

Linares-Barranco, Bernabe .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (11) :2706-2719

[5] A Survey of Encoding Techniques for Signal Processing in Spiking Neural Networks [J].

Auge, Daniel ;

Hille, Julian ;

Mueller, Etienne ;

Knoll, Alois .

NEURAL PROCESSING LETTERS, 2021, 53 (06) :4693-4710

[6] Evolving spiking neural networks for robot control [J].

Batllori, R. ;

Laramee, C. B. ;

Land, W. ;

Schaffer, J. D. .

COMPLEX ADAPTIVE SYSTEMS, 2011, 6

[7]

Bellec G, 2018, 32 C NEURAL INFORM P

[8] Error-backpropagation in temporally encoded networks of spiking neurons [J].

Bohte, SM ;

Kok, JN ;

La Poutré, H .

NEUROCOMPUTING, 2002, 48 :17-37

[9]

Brockman G, 2016, Arxiv, DOI arXiv:1606.01540

[10] Interspike interval correlations, memory, adaptation, and refractoriness in a leaky integrate-and-fire model with threshold fatigue [J].

Chacron, MJ ;

Pakdaman, K ;

Longtin, A .

NEURAL COMPUTATION, 2003, 15 (02) :253-278

← 1 2 3 4 5 6 →