A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback

被引:188
作者
Legenstein, Robert [1 ]
Pecevski, Dejan [1 ]
Maass, Wolfgang [1 ]
机构
[1] Graz Univ Technol, Inst Theoret Comp Sci, A-8010 Graz, Austria
基金
奥地利科学基金会;
关键词
D O I
10.1371/journal.pcbi.1000180
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Reward-modulated spike-timing-dependent plasticity (STDP) has recently emerged as a candidate for a learning rule that could explain how behaviorally relevant adaptive changes in complex networks of spiking neurons could be achieved in a self-organizing manner through local synaptic plasticity. However, the capabilities and limitations of this learning rule could so far only be tested through computer simulations. This article provides tools for an analytic treatment of reward-modulated STDP, which allows us to predict under which conditions reward-modulated STDP will achieve a desired learning effect. These analytical results imply that neurons can learn through reward-modulated STDP to classify not only spatial but also temporal firing patterns of presynaptic neurons. They also can learn to respond to specific presynaptic firing patterns with particular spike patterns. Finally, the resulting learning theory predicts that even difficult credit-assignment problems, where it is very hard to tell which synaptic weights should be modified in order to increase the global reward for the system, can be solved in a self-organizing manner through reward-modulated STDP. This yields an explanation for a fundamental experimental result on biofeedback in monkeys by Fetz and Baker. In this experiment monkeys were rewarded for increasing the firing rate of a particular neuron in the cortex and were able to solve this extremely difficult credit assignment problem. Our model for this experiment relies on a combination of reward-modulated STDP with variable spontaneous firing activity. Hence it also provides a possible functional explanation for trial-to-trial variability, which is characteristic for cortical networks of neurons but has no analogue in currently existing artificial computing systems. In addition our model demonstrates that reward-modulated STDP can be applied to all synapses in a large recurrent neural network without endangering the stability of the network dynamics.
引用
收藏
页数:27
相关论文
共 50 条
[1]   Synaptic plasticity: taming the beast [J].
Abbott, L. F. ;
Nelson, Sacha B. .
NATURE NEUROSCIENCE, 2000, 3 (11) :1178-1183
[2]   Stimulus dependence of two-state fluctuations of membrane potential in cat visual cortex [J].
Anderson, J ;
Lampl, I ;
Reichova, I ;
Carandini, M ;
Ferster, D .
NATURE NEUROSCIENCE, 2000, 3 (06) :617-621
[3]  
[Anonymous], TEMPORAL DYNAMICS IN, DOI [DOI 10.7551/MITPRESS/7503.003.0135, 10.7551/mitpress/7503.003.0135]
[4]   Is heterosynaptic modulation essential for stabilizing Hebbian plasticity and memory? [J].
Bailey, CH ;
Giustetto, M ;
Huang, YY ;
Hawkins, RD ;
Kandel, ER .
NATURE REVIEWS NEUROSCIENCE, 2000, 1 (01) :11-20
[5]   Cortical remodelling induced by activity of ventral tegmental dopamine neurons [J].
Bao, SW ;
Chan, WT ;
Merzenich, MM .
NATURE, 2001, 412 (6842) :79-83
[6]   Reinforcement learning, spike-time-dependent plasticity, and the BCM rule [J].
Baras, Dorit ;
Meir, Ron .
NEURAL COMPUTATION, 2007, 19 (08) :2245-2279
[7]  
BAXTER J, 1999, DIRECT GRADIENT BASE
[8]   Synaptic modifications in cultured hippocampal neurons: Dependence on spike timing, synaptic strength, and postsynaptic cell type [J].
Bi, GQ ;
Poo, MM .
JOURNAL OF NEUROSCIENCE, 1998, 18 (24) :10464-10472
[9]   Visual input evokes transient and strong shunting inhibition in visual cortical neurons [J].
Borg-Graham, LJ ;
Monier, C ;
Frégnac, Y .
NATURE, 1998, 393 (6683) :369-373
[10]   Dynamics of networks of randomly connected excitatory and inhibitory spiking neurons [J].
Brunel, N .
JOURNAL OF PHYSIOLOGY-PARIS, 2000, 94 (5-6) :445-463