Statistical Mechanics of Reward-Modulated Learning in Decision-Making Networks

被引:3
作者
Katahira, Kentaro [1 ,2 ,3 ]
Okanoya, Kazuo [1 ,3 ,4 ]
Okada, Masato [1 ,2 ,3 ]
机构
[1] Japan Sci & Technol Agcy, ERATO, Okanoya Emot Informat Project, Wako, Saitama 3510198, Japan
[2] Univ Tokyo, Grad Sch Frontier Sci, Chiba 2775861, Japan
[3] RIKEN Brain Sci Inst, Wako, Saitama 3510198, Japan
[4] Univ Tokyo, Grad Sch Arts & Sci, Tokyo 1538902, Japan
关键词
DEPENDENT SYNAPTIC PLASTICITY; MATCHING LAW; CORTICAL CIRCUITS; REINFORCEMENT; BEHAVIOR; MODEL; CORTEX; CHOICE; CELLS;
D O I
10.1162/NECO_a_00264
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The neural substrates of decision making have been intensively studied using experimental and computational approaches. Alternative-choice tasks accompanying reinforcement have often been employed in investigations into decision making. Choice behavior has been empirically found in many experiments to follow Herrnstein's matching law. A number of theoretical studies have been done on explaining the mechanisms responsible for matching behavior. Various learning rules have been proved in these studies to achieve matching behavior as a steady state of learning processes. The models in the studies have consisted of a few parameters. However, a large number of neurons and synapses are expected to participate in decision making in the brain. We investigated learning behavior in simple but large-scale decision-making networks. We considered the covariance learning rule, which has been demonstrated to achieve matching behavior as a steady state (Loewenstein & Seung, 2006). We analyzed model behavior in a thermodynamic limit where the number of plastic synapses went to infinity. By means of techniques of the statistical mechanics, we can derive deterministic differential equations in this limit for the order parameters, which allow an exact calculation of the evolution of choice behavior. As a result, we found that matching behavior cannot be a steady state of learning when the fluctuations in input from individual sensory neurons are so large that they affect the net input to value-encoding neurons. This situation naturally arises when the synaptic strength is sufficiently strong and the excitatory input and the inhibitory input to the value-encoding neurons are balanced. The deviation from matching behavior is caused by increasing variance in the input potential due to the diffusion of synaptic efficacies. This effect causes an undermatching phenomenon, which has been often observed in behavioral experiments.
引用
收藏
页码:1230 / 1270
页数:41
相关论文
共 50 条
  • [21] Supervised Learning in SNN via Reward-Modulated Spike-Timing-Dependent Plasticity for a Target Reaching Vehicle
    Bing, Zhenshan
    Baumann, Ivan
    Jiang, Zhuangyi
    Huang, Kai
    Cai, Caixia
    Knoll, Alois
    FRONTIERS IN NEUROROBOTICS, 2019, 13
  • [22] A statistical theory of optimal decision-making in sports betting
    Dmochowski, Jacek
    PLOS ONE, 2023, 18 (06):
  • [23] Statistical nonlinear analysis for reliable promotion decision-making
    Soguero-Ruiz, Cristina
    Javier Gimeno-Blanes, Francisco
    Mora-Jimenez, Inmaculada
    del Pilar Martinez-Ruiz, Maria
    Luis Rojo-Alvarez, Jose
    DIGITAL SIGNAL PROCESSING, 2014, 33 : 156 - 168
  • [24] A ROBUST INFERENCE METHOD FOR DECISION-MAKING IN NETWORKS
    Schecter, Aaron
    Nohadani, Omid
    Contractor, Noshir
    MIS QUARTERLY, 2022, 46 (02) : 713 - 738
  • [25] Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation
    Saito, Hiroshi
    Katahira, Kentaro
    Okanoya, Kazuo
    Okada, Masato
    JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2010, 79 (06)
  • [26] Gender differences in preference for reward frequency versus reward magnitude in decision-making under uncertainty
    Cornwall, Astin C.
    Byrne, Kaileigh A.
    Worthy, Darrell A.
    PERSONALITY AND INDIVIDUAL DIFFERENCES, 2018, 135 : 40 - 44
  • [27] Quantum stochastic walks on networks for decision-making
    Martinez-Martinez, Ismael
    Sanchez-Burillo, Eduardo
    SCIENTIFIC REPORTS, 2016, 6
  • [28] Adolescent risky decision-making: Neurocognitive development of reward and control regions
    Van Leijenhorst, Linda
    Moor, Bregtje Gunther
    de Macks, Zdena A. Op
    Rombouts, Serge A. R. B.
    Westenberg, P. Michiel
    Crone, Eveline A.
    NEUROIMAGE, 2010, 51 (01) : 345 - 355
  • [29] Gender differences in reward sensitivity and information processing during decision-making
    Byrne, Kaileigh A.
    Worthy, Darrell A.
    JOURNAL OF RISK AND UNCERTAINTY, 2015, 50 (01) : 55 - 71
  • [30] Framing decision-making: the role of executive functions, cognitive bias and reward
    Rovelli, Katia
    Allegretta, Roberta Antonia
    NEUROPSYCHOLOGICAL TRENDS, 2023, (33) : 37 - 50