Statistical Mechanics of Reward-Modulated Learning in Decision-Making Networks

被引：3

作者：

Katahira, Kentaro ^{[1
,2
,3
]}

Okanoya, Kazuo ^{[1
,3
,4
]}

Okada, Masato ^{[1
,2
,3
]}

机构：

[1] Japan Sci & Technol Agcy, ERATO, Okanoya Emot Informat Project, Wako, Saitama 3510198, Japan

[2] Univ Tokyo, Grad Sch Frontier Sci, Chiba 2775861, Japan

[3] RIKEN Brain Sci Inst, Wako, Saitama 3510198, Japan

[4] Univ Tokyo, Grad Sch Arts & Sci, Tokyo 1538902, Japan

来源：

NEURAL COMPUTATION | 2012年 / 24卷 / 05期

关键词：

DEPENDENT SYNAPTIC PLASTICITY; MATCHING LAW; CORTICAL CIRCUITS; REINFORCEMENT; BEHAVIOR; MODEL; CORTEX; CHOICE; CELLS;

D O I：

10.1162/NECO_a_00264

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The neural substrates of decision making have been intensively studied using experimental and computational approaches. Alternative-choice tasks accompanying reinforcement have often been employed in investigations into decision making. Choice behavior has been empirically found in many experiments to follow Herrnstein's matching law. A number of theoretical studies have been done on explaining the mechanisms responsible for matching behavior. Various learning rules have been proved in these studies to achieve matching behavior as a steady state of learning processes. The models in the studies have consisted of a few parameters. However, a large number of neurons and synapses are expected to participate in decision making in the brain. We investigated learning behavior in simple but large-scale decision-making networks. We considered the covariance learning rule, which has been demonstrated to achieve matching behavior as a steady state (Loewenstein & Seung, 2006). We analyzed model behavior in a thermodynamic limit where the number of plastic synapses went to infinity. By means of techniques of the statistical mechanics, we can derive deterministic differential equations in this limit for the order parameters, which allow an exact calculation of the evolution of choice behavior. As a result, we found that matching behavior cannot be a steady state of learning when the fluctuations in input from individual sensory neurons are so large that they affect the net input to value-encoding neurons. This situation naturally arises when the synaptic strength is sufficiently strong and the excitatory input and the inhibitory input to the value-encoding neurons are balanced. The deviation from matching behavior is caused by increasing variance in the input potential due to the diffusion of synaptic efficacies. This effect causes an undermatching phenomenon, which has been often observed in behavioral experiments.

引用

页码：1230 / 1270

页数：41

共 50 条

[41] Do online social networks support decision-making?
Sadovykh, Valeria
Sundaram, David
Piramuthu, Selwyn
DECISION SUPPORT SYSTEMS, 2015, 70 : 15 - 30
[42] Reward and punisher experience alter rodent decision-making in a judgement bias task
Neville, Vikki
King, Jessica
Gilchrist, Iain D.
Dayan, Peter
Paul, Elizabeth S.
Mendl, Michael
SCIENTIFIC REPORTS, 2020, 10 (01)
[43] Positive reward prediction errors during decision-making strengthen memory encoding
Jang, Anthony, I
Nassar, Matthew R.
Dillon, Daniel G.
Frank, Michael J.
NATURE HUMAN BEHAVIOUR, 2019, 3 (07) : 719 - 732
[44] Influence of social networks and opportunities for social support on evacuation destination decision-making
Na, Hyeong Suk
Grace, Rob
SAFETY SCIENCE, 2022, 147
[45] Decision-making processes in perceptual learning depend on effectors
Ivanov, Vladyslav
Manenti, Giorgio L.
Plewe, Sandrin S.
Kagan, Igor
Schwiedrzik, Caspar M.
SCIENTIFIC REPORTS, 2024, 14 (01)
[46] Quantum reinforcement learning during human decision-making
Li, Ji-An
Dong, Daoyi
Wei, Zhengde
Liu, Ying
Pan, Yu
Nori, Franco
Zhang, Xiaochu
NATURE HUMAN BEHAVIOUR, 2020, 4 (03) : 294 - 307
[47] Contextual Decision-Making and Alcohol Use Disorder Criteria: Delayed Reward, Delayed Loss, and Probabilistic Reward Discounting
Acuff, Samuel F.
Boness, Cassandra L.
McDowell, Yoanna
Murphy, James G.
Sher, Kenneth J.
PSYCHOLOGY OF ADDICTIVE BEHAVIORS, 2023, 37 (01) : 121 - 131
[48] University Students' Understanding of Information Media in Learning: A Focus on the Decision-making for Selection
Iwase, Azusa
LIBRARY AND INFORMATION SCIENCE, 2021, (85): : 1 - 22
[49] Computation noise promotes zero-shot adaptation to uncertainty during decision-making in artificial neural networks
Findling, Charles
Wyart, Valentin
SCIENCE ADVANCES, 2024, 10 (44):
[50] College Students' Learning Decision-Making Based on Group Learning Behavior
Li, Lin
Chen, Dongfang
Li, Tao
INTERNATIONAL JOURNAL OF EMERGING TECHNOLOGIES IN LEARNING, 2022, 17 (08) : 155 - 168

← 1 2 3 4 5 →