Conditioning and time representation in long short-term memory networks

被引:0
作者
Francois Rivest
John F. Kalaska
Yoshua Bengio
机构
[1] Royal Military College of Canada,Department of Mathematics and Computer Science
[2] Queen’s University,Centre for Neuroscience Studies
[3] University of Montreal,Department of Physiology
[4] University of Montreal,Department of Computer Science and Operations Research
来源
Biological Cybernetics | 2014年 / 108卷
关键词
Time representation learning; Temporal-difference learning; Long short-term memory networks; Dopamine; Conditioning; Reinforcement learning;
D O I
暂无
中图分类号
学科分类号
摘要
Dopaminergic models based on the temporal-difference learning algorithm usually do not differentiate trace from delay conditioning. Instead, they use a fixed temporal representation of elapsed time since conditioned stimulus onset. Recently, a new model was proposed in which timing is learned within a long short-term memory (LSTM) artificial neural network representing the cerebral cortex (Rivest et al. in J Comput Neurosci 28(1):107–130, 2010). In this paper, that model’s ability to reproduce and explain relevant data, as well as its ability to make interesting new predictions, are evaluated. The model reveals a strikingly different temporal representation between trace and delay conditioning since trace conditioning requires working memory to remember the past conditioned stimulus while delay conditioning does not. On the other hand, the model predicts no important difference in DA responses between those two conditions when trained on one conditioning paradigm and tested on the other. The model predicts that in trace conditioning, animal timing starts with the conditioned stimulus offset as opposed to its onset. In classical conditioning, it predicts that if the conditioned stimulus does not disappear after the reward, the animal may expect a second reward. Finally, the last simulation reveals that the buildup of activity of some units in the networks can adapt to new delays by adjusting their rate of integration. Most importantly, the paper shows that it is possible, with the proposed architecture, to acquire discharge patterns similar to those observed in dopaminergic neurons and in the cerebral cortex on those tasks simply by minimizing a predictive cost function.
引用
收藏
页码:23 / 48
页数:25
相关论文
共 175 条
[1]  
Balci F(2009)Acquisition of peak responding: what is learned? Behav Process 80 67-75
[2]  
Gallistel CR(2002)Timing at the start of associative learning Learn. Motiv. 33 141-155
[3]  
Allen BD(1994)Learning long-term dependencies with gradient descent is difficult IEEE Trans Neural Netw 5 157-166
[4]  
Frank KM(2001)The role of the hippocampus in trace conditioning: temporal discontinuity or task difficulty? Neurobiol Learn Mem 76 447-461
[5]  
Gibson JM(2003)Timing and neural encoding of somatosensory parametric working memory in macaque prefrontal cortex Cereb Cortex 13 1196-1207
[6]  
Brunner D(1999)How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues J Neurosci 19 10502-10511
[7]  
Balsam PD(2000)Timing for the absence of a stimulus: the gap paradigm reversed J Exp Psychol Anim Behav Process 26 305-322
[8]  
Drew MR(2005)What makes us tick? Functional and neural mechanisms of interval timing Nat Rev Neurosci 6 755-765
[9]  
Yang C(2005)A learning rule for the emergence of stable dynamics and timing in recurrent networks J Neurophysiol 94 2275-2283
[10]  
Bengio Y(1996)Neuronal activity in posterior parietal area 7a during the delay periods of a spatial memory task J Neurophysiol 76 1352-1355