Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation

被引：1

作者：

Saito, Hiroshi ^{[1
]}

Katahira, Kentaro ^{[1
,2
,3
]}

Okanoya, Kazuo ^{[2
,3
]}

Okada, Masato ^{[1
,2
,3
]}

机构：

[1] Univ Tokyo, Grad Sch Frontier Sci, Dept Complex Sci & Engn, Chiba 2778561, Japan

[2] RIKEN Brain Sci Inst, Wako, Saitama 3510198, Japan

[3] Japan Sci & Technol Agcy, ERATO, Okanoya Emot Informat Project, Tokyo, Japan

来源：

JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN | 2010年 / 79卷 / 06期

关键词：

statistical mechanics; delayed reward; eligibility trace; node perturbation; reward-based learning; NETWORKS; SEQUENCE; SONGBIRD; MODEL;

D O I：

10.1143/JPSJ.79.064003

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

In reward-based learning, reward is typically given with some delay after a behavior that causes the reward. In machine learning literature, the framework of the eligibility trace has been used as one of the solutions to handle the delayed reward in reinforcement learning. In recent studies, the eligibility trace is implied to be important for difficult neuroscience problem known as the "distal reward problem''. Node perturbation is one of the stochastic gradient methods from among many kinds of reinforcement learning implementations, and it searches the approximate gradient by introducing perturbation to a network. Since the stochastic gradient method does not require a objective function differential, it is expected to be able to account for the learning mechanism of a complex system, like a brain. We study the node perturbation with the eligibility trace as a specific example of delayed reward-based learning, and analyzed it using a statistical mechanics approach. As a result, we show the optimal time constant of the eligibility trace respect to the reward delay and the existence of unlearnable parameter configurations.

引用

页数：6

共 21 条

[1]

[Anonymous], 1898, ANIMAL INTELLIGENCE

[2]

[Anonymous], 1999, ON LINE LEARNING NEU

[3]

[Anonymous], 1941, Conditioned reflexes and psychiatry

[4]

CUN YL, 1989, ADV NEURAL INFORMATI, V1, P141

[5]

Dayan P., 2001, Theoretical neuroscience: computational and mathematical modeling of neural systems

[6]

Dembo A, 1990, IEEE Trans Neural Netw, V1, P58, DOI 10.1109/72.80205

[7] Neural mechanisms of vocal sequence generation in the songbird [J].

Fee, MS ;

Kozhevnikov, AA ;

Hahnloser, RHR .

BEHAVIORAL NEUROBIOLOGY OF BIRDSONG, 2004, 1016 :153-170

[8] Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances [J].

Fiete, Ila R. ;

Fee, Michale S. ;

Seung, H. Sebastian .

JOURNAL OF NEUROPHYSIOLOGY, 2007, 98 (04) :2038-2057

[9] Gradient learning in spiking neural networks by dynamic perturbation of conductances [J].

Fiete, Ila R. ;

Seung, H. Sebastian .

PHYSICAL REVIEW LETTERS, 2006, 97 (04)

[10] Social context modulates singing-related neural activity in the songbird forebrain [J].

Hessler, NA ;

Doupe, AJ .

NATURE NEUROSCIENCE, 1999, 2 (03) :209-211

← 1 2 3 →