Reinforcement learning is a powerful framework for modelling the cognitive and neural substrates of learning and decision making. Contemporary research in cognitive neuroscience and neuroeconomics typically uses value-based reinforcement-learning models, which assume that decision-makers choose by comparing learned values for different actions. However, another possibility is suggested by a simpler family of models, called policy-gradient reinforcement learning. Policy-gradient models learn by optimizing a behavioral policy directly, without the intermediate step of value-learning. Here we review recent behavioral and neural findings that are more parsimoniously explained by policy-gradient models than by value-based models. We conclude that, despite the ubiquity of 'value' in reinforcement-learning models of decision making, policy-gradient models provide a lightweight and compelling alternative model of operant behavior.
机构:
Stanford Univ, Neurosci Grad Training Program, Stanford, CA 94305 USA
Univ Calif Berkeley, Helen Wills Neurosci Inst, Berkeley, CA 94720 USA
Arizona State Univ, Dept Psychol, Tempe, AZ 85287 USAStanford Univ, Neurosci Grad Training Program, Stanford, CA 94305 USA
Ballard, Ian C.
McClure, Samuel M.
论文数: 0引用数: 0
h-index: 0
机构:
Arizona State Univ, Dept Psychol, Tempe, AZ 85287 USAStanford Univ, Neurosci Grad Training Program, Stanford, CA 94305 USA
机构:
Stanford Univ, Neurosci Grad Training Program, Stanford, CA 94305 USA
Univ Calif Berkeley, Helen Wills Neurosci Inst, Berkeley, CA 94720 USA
Arizona State Univ, Dept Psychol, Tempe, AZ 85287 USAStanford Univ, Neurosci Grad Training Program, Stanford, CA 94305 USA
Ballard, Ian C.
McClure, Samuel M.
论文数: 0引用数: 0
h-index: 0
机构:
Arizona State Univ, Dept Psychol, Tempe, AZ 85287 USAStanford Univ, Neurosci Grad Training Program, Stanford, CA 94305 USA