Operant conditioning in skinnerbots

被引:54
作者
Touretzky, DS
Saksida, LM
机构
[1] CARNEGIE MELLON UNIV,CTR NEURAL BASIS COGNIT,PITTSBURGH,PA 15213
[2] CARNEGIE MELLON UNIV,INST ROBOT,PITTSBURGH,PA 15213
关键词
operant conditioning; instrumental learning; shaping; chaining; learning robots;
D O I
10.1177/105971239700500302
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Instrumental (or operant) conditioning, a form of animal learning, is similar to reinforcement learning (Watkins, 1989) in that it allows an agent to adapt its actions to gain maximally from the environment while being rewarded only for correct performance. However, animals learn much more complicated behaviors through instrumental conditioning than robots presently acquire through reinforcement learning. We describe a new computational model of the conditioning process that attempts to capture some of the aspects that are missing from simple reinforcement learning: conditioned reinforcers, shifting reinforcement contingencies, explicit action sequencing, and state space refinement. We apply our model to a task commonly used to study working memory in rats and monkeys-the delayed match-to-sample task Animals learn this task in stages. In simulation, our model also acquires the task in stages, in a similar manner. We have used the model to train an RWI B21 robot.
引用
收藏
页码:219 / 247
页数:29
相关论文
共 51 条
[1]  
[Anonymous], 1989, PSYCHOL LEARNING BEH
[2]  
ASADA M, 1994, P INT C INT ROB SYST
[3]  
BARNETT SA, 1981, MODERN ETHOLOGY
[4]  
Barto A.G., 1990, Learning and Computational Neuroscience
[5]  
BAXTER DA, 1991, NEURAL NETWORK MODEL
[6]   NATURAL SYNTAX RULES CONTROL ACTION SEQUENCE OF RATS [J].
BERRIDGE, KC ;
FENTRESS, JC ;
PARR, H .
BEHAVIOURAL BRAIN RESEARCH, 1987, 23 (01) :59-68
[7]   DELAYED MATCHING IN THE PIGEON [J].
BLOUGH, DS .
JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR, 1959, 2 (02) :151-160
[8]   THE MISBEHAVIOR OF ORGANISMS [J].
BRELAND, K ;
BRELAND, M .
AMERICAN PSYCHOLOGIST, 1961, 16 (11) :681-684
[9]   AUTO-SHAPING OF PIGEONS KEY-PECK [J].
BROWN, PL ;
JENKINS, HM .
JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR, 1968, 11 (01) :1-&
[10]  
BUSSEY TJ, 1994, NEUROSCI RES COMMUN, V15, P103